← 2026-04-11

Daily Edition

2026-04-12

2026-04-13 →

AI Builders 日报 — 4月12日

追踪 AI 领域真正在做事的人,而不是空谈者。

今日思考

今天的信号很清晰:AI 基础设施的军备竞赛正在加速,但落地路径正在分化。一边是 Amazon 三年来 capex 超此前总和,NVIDIA 免费开放 MiniMax M2.7 API,模型层竞争进入白热化;另一边是企业落地发现代理比聊天复杂得多——工作流改造、遗留系统对接、变更管理才是真正的门槛,而非模型能力本身。内存架构之争(文件 vs 图谱)则揭示了一个更深层的矛盾:谁来维护 agent 的记忆,harness 还是用户?这将在多 agent 协作时代成为核心问题。


产品与发布

LLM Cheat-Sheet for Hermes + OpenClaw Agents

Garry Tan 整理了 18 个模型的四层排名体系。GLM-5.1 凭 SWE-Pro 全球第一、8小时自主执行进入 Tier 1;Grok 4.20 以最低幻觉率和 2M context 进入 Tier 2;Mistral Small 4 以一替三(推理+视觉+编码)$0.15/M input 进入 Tier 3。Tier 4 本地模型全部可在 32GB RAM 运行。模型层格局快速重排,成本和 agent 适配性成关键指标。

faviconx.com

gbrain v0.9.0

Garry Tan 发布 gbrain 更新:新增 lint 命令,零 LLM 调用扫描 markdown 文件中的垃圾内容(preamble、代码框、占位日期、损坏引用),纯代码 deterministic 修复;新增 publish 命令将脑页转为自包含 HTML,可选 AES-256 客户端加密。底层仍是 Postgres + pgvector,markdown 是接口而非引擎。

faviconx.com

Apple 自动 App Review

Peter Steinberger 指出 Apple 悄然推出自动审核,对 vibe coding 浪潮的 app 批量处理。系统目前粗糙:凡使用含 attribution 数据的 SDK 直接标记为广告遭拒,Firebase 匿名认证被识别为登录流程需提供演示视频。修复方法:在 App Review Information 备注"无广告、无登录"即可。期待未来对可信账户实现秒过。

faviconx.com

NVIDIA 免费 MiniMax M2.7 API

Garry Tan 注意到 NVIDIA 开放完全免费的 MiniMax M2.7 API 端点,可通过 OpenCrabs 配置为自定义 provider。这是模型层竞争加剧的信号——供应方正在用免费层争夺开发者注意力。

faviconx.com

OpenAI "Spud" 已秘密内测

Peter Steinberger 援引 Brad Gerstner 在 All-In Podcast 的爆料:OpenAI 下一代"Sput"(预计 GPT-5.5)已在少数人中秘密测试,早期反馈称其能力与 Mythos 持平,且"包装方式更可用"。Gerstner 称当前是"OpenAI FUD 峰值",警告不要低估这家公司的反击。

faviconx.com

OpenAI Codex Scratchpad 多线程

Dan Shipper 转发 TestingCatalog 新闻:OpenAI 正在为 Codex 测试"Scratchpad"功能,用户可从 TODO 列表视图同时启动多个 Codex 聊天并行执行,将成为 Codex Superapp 的核心交互模式。

faviconx.com


观点与判断

Garry Tan (Y Combinator 总裁)

  • Amazon 的 capex 图表说明了一切 Amazon 过去三年 capex 超出其前 26 年总和。目前 AI 应用大多是对话式,token 效率相对高;编码代理的 token 消耗高几个数量级,但用户群还小。随着代理能力渗透知识工作,token 处理需求可能是现在的几百倍,这些图表将继续垂直攀升。

faviconx.com

  • "个人 AI 软件"时代正在到来 "你每次修改我的 Claw,你也会得到它。而你的 Claw 会帮你挑选你想要的东西,将我的概念自定义配置到你的需求上。这就是个人 AI 软件的时代,即时生成,专门为你定制。" 代理正在从工具变成贴身协作伙伴。

faviconx.com

  • 安全行业将迎来 Jevons 悖论时刻 AI 自动化了漏洞发现环节,但无法自动化响应。更多真实威胁被更快发现,意味着更多分类、更多修复、更多需要人类判断的架构决策。安全人才需求反而会增加,这是一个被严重低估的机会领域。

faviconx.com

  • 内存架构之争:文件 vs 图谱 Garry 主张"内存是 markdown,brain 是 git repo,harness 是薄导体"。但一位深入分析者指出:文件模型有根本缺陷——无法主动判断何时遗忘、无法处理关系图谱、注入全靠碰运气。GBrain 底层也是 Postgres + pgvector。"没有人完全解决了内存问题,memory.md 是起点而非终点。"

faviconx.com

  • SaaS 正在被 AI 重新定价 Atlassian -75%,HubSpot -69%,Figma -86%。几乎所有 SaaS 距 52 周高点跌 30-70%。"AI 正在吃掉软件,并实时重估每一家公司。SaaS 完了。"

faviconx.com

Yann LeCun (Meta 首席 AI 科学家)

  • 机器人manipulation 离 5 岁小孩还差得远 "每周都能看到宣传说机器人manipulation已解决,但上周别人已经解决了。所以我提议:下次大会旁边放一张桌子,桌子边坐一个 5 岁小孩,并排测试 100 个日常物品manipulation任务。能捡起硬币吗?能拧开瓶盖吗?能插上插头吗?在机器人能完成 5 岁小孩能做的所有'开放世界manipulation'之前,保持一些谦逊。"

faviconx.com

  • 霍尔木兹海峡危机与房地产经纪人 JD Vance 从巴基斯坦空手而归,随行人员中有两个弗吉尼亚的房产经纪人。伊朗注意到了。全球 20% 石油过境霍尔木兹,目前海峡收窄至"信箱宽度",油价正在失控。

faviconx.com

  • Copernican 视角看智能 Terence Tao 提出:人类智能不是唯一形式,也不一定是最高形式。人类智能和计算机智能各有优劣,真正的潜力在于协作,而非竞争。

faviconx.com

  • Mythos 的"零日漏洞"是夸大宣传 "需要融资数十亿美元时,制造震惊效应至关重要。Mythos 发现的大量'漏洞'——大多数在旧软件中不可能被利用,而那些严重的零日报告仅依赖 198 次人工审查。"

faviconx.com

  • 大选干预 2.0 "让选举干预再次伟大,@JDVance。"

faviconx.com

Peter Steinberger (资深 iOS 开发者)

  • 个人 AI 代理 vs 聊天界面 "我完全看不到 Cowork 的意义。我甚至不觉得 ChatGPT 或 Claude 还有什么用。我要么想要一个 Claw(一个运行在持久计算机上的代理,没有任何限制,通过 iMessage 或 Telegram 对话),要么想要 Claude Code 或 Codex App 来编程。"

faviconx.com

  • Matt Mahan 竞选加州州长 Garry Tan 转发了 Matt Mahan 支持者的观点:"如果 Mahan 当选将是变革性的。一个理智、有能力的中间派技术官僚来领导管理最差、被左派特殊利益主导的州?这将是反-Mamdani 的白色药丸时刻。"

faviconx.com

swyx (AI Engineer 创始人)

  • 伦敦 AI 工程师大会 8 大主题 swyx 从伦敦 AI Engineer 大会归来,总结了与 top AI 工程师一周交流后的 8 个核心主题,完整博客即将发布。

faviconx.com


技术动态

Garry Tan (Y Combinator 总裁)

  • HERMES 代理架构:显式自改进 Garry Tan 转发了一个关于 Hermes 代理架构的分析:"Hermes 正在采取比大多数代理系统更显式的自改进路线。它不是在做一些离线的轨迹挖掘,而是在..." 这暗示 Hermes 在架构设计上选择了不同于主流的自我优化路径,值得关注。

faviconx.com

X / Twitter

65
Garry Tan
Garry Tan @garrytan
Retweeted
Steven Tavares Steven Tavares
I’ve covered Eric Swawell since he was a member of the Dublin City Council. Shortly after being elected to Congress in 2013, his behavior towards women was known by all levels of our local government and the Alameda County Democratic Party.
Yann LeCun
Yann LeCun @ylecun
Retweeted
Lain on the Blockchain Lain on the Blockchain
After seeing that Claude Mythos marketing turned out to be, as expected, a scam, I wanted to make a master list of tricks being used to market LLMs.
The master list includes statements directly from leadership in the companies or from the "organic marketing" of people on social media, along with an explanation on how the scam works. This is my first attempt, so likely incomplete.
The LLM Marketing Scams Master List v1:
"Two more weeks" - the models will be good enough someday soon to do what we claim.
"They're already good enough" - the models are already good enough to replace workers, but it hasn't happened yet because of x y z reasons.
"We just built God in the backroom, and no, you can't see it" - the models they built in private are actually capable of doing the things we have been waiting for, but they can't let us see them yet for x y z reasons.
"Actually they already have replaced jobs" - the layoffs that tech companies have been doing, citing AI as the reason, have already been replaced with current LLM tech, ignoring market conditions and past data on layoffs during such conditions.
"You just don't know how to use then as well as me" - the models are good enough, but esoteric prompt engineering is required to get these results, and no, I won't teach you.
"I built an app making big money with LLMs" - they claim they already have made startup companies, almost always SaaS companies, that are making them tons of money, but when you ask to see them, they won't show you.
"You aren't using the right model" - claims that you must be using the wrong model and need to use Open Claude 420b-parameter Gemini Plus Pro 6.9 with 4RealThisTime HomerSimpson agent mode enabled. Note that this will be used to attack every study on the effectiveness of LLMs, since studies take time to complete and publish, with new models releasing more frequently than it's possible to complete and publish a study
"You're falling behind" - claims that you need to use the bots now, even though they aren't good enough to fully automate any jobs, because otherwise, when the bots are good enough, you will lose your natural English skills required to prompt effectively.
"All these companies are using LLMs, so do you think you know better than they do?" - pointing to claims of large companies deeply invested in LLMs being a success saying that LLMs are being used effectively, with no viewable results in the speed and/or quality of their company's output.
"The benchmark score went up" - claiming improvements on the benchmarking tests given to their latest model, despite the training being specifically tuned to improve on these tests, and then conflating better benchmark scores with actually being more able to automate jobs or drastically improve worker productivity.
"It can now count the letters in Strawberry/can now do things it famously couldn't do previously" - saying that it can now count the letters in Strawberry or instruct you on how to use a cup without a bottom, etc. is often done to suggest increased reasoning for the LLM, but often involves just hard coding an answer into the service.
"It has escaped our control" - saying that they cannot control the LLM, implying it is conscious or living to some degree when really it just said words that it wasn't supposed to or an agent used an app that wasn't intended by the user's prompt when next-token predicting
"It's feeling sad/scared/happy/angry, suggesting it is conscious" - they ask the LLM how it is feeling, and it next-token predicts a response that includes an emotion felt by humans, since training data is from human conversations online.
"Costs are going down/the LLM service is profitable" - ignores training costs and capex for hardware, usually just referring to inference being profitable, which isn't even true in many cases. Training and capex is 95%+ of the total costs to serve the models.
Did I miss any?
Amjad Masad
Amjad Masad @amasad
Retweeted
Samuel Spitz Samuel Spitz
I vibecoded this video in a few mins
It got 200,000+ views
That’s the 4th time that’s happened this month
It’s never been easier to go viral
Replit ⠕: Introducing Replit Animation
Vibecode your next viral video in minutes, powered by Gemini 3.1 Pro.
(This video was 100% made in Replit Animation)
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Retweeted
ollama ollama
MiniMax M2.7 is available on Ollama's cloud, and is licensed for commercial usage.
Use it with OpenClaw:
ollama launch openclaw --model minimax-m2.7:cloud
Coding agents, such as Claude:
ollama launch claude --model minimax-m2.7:cloud
Chat with the model:
ollama run minimax-m2.7:cloud
MiniMax (official): We're delighted to announce that MiniMax M2.7 is now officially open source.
With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%).
You can find it on Hugging Face now. Enjoy!🤗
huggingface:https://huggingface.co/MiniMaxAI/MiniMax-M2.7
Blog: https://www.minimax.io/news/minimax-m27-en
MiniMax API:
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Retweeted
Deva Hazarika Deva Hazarika
For You feed is great today, this is Nikita’s Liberation Day
Dan Shipper 📧
Dan Shipper 📧 @danshipper
Retweeted
Brandon Gell Brandon Gell
This is so smart and might even be more impactful for non-devs than devs
TestingCatalog News 🗞: OpenAI is working on a new experimental feature for Codex called Scratchpad.
Users will be able to start multiple Codex chats from a TODO list view, which will be executed in parallel.
It will become very instrumental in the upcoming Codex Superapp, where you will be able to
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Retweeted
Viktor Seraleev Viktor Seraleev
While you were sleeping, Apple pulled off a quiet revolution. They silently rolled out automated app review, their answer to the surge of apps driven by the vibe coding trend.
Auto-review is the first stage of the review process. Right now it’s rough. The system flags any SDK that collects attribution data as a signal that your app contains ads. Developers are getting hit with auto-rejections left and right.
It also automatically detects Firebase anonymous auth as a sign that your app has a login flow and asks you to provide a demo video.
The fix is simple though. Just add a note in App Review Information clarifying that your app has no ads and no login feature.
Hopefully Apple ships a fix soon and we end up with fast automated reviews for trusted accounts, similar to how Google Play already handles it.
Yann LeCun
Yann LeCun @ylecun
Retweeted
Jitendra MALIK Jitendra MALIK
I see every week on X an announcement or demo which implies that robotic manipulation has been solved. The only reason I don't believe it is because manipulation had already been solved last week by somebody else! So may I propose the "5 year old paired comparison test" ? At the next conference let's set up a number of tables to which you can bring your robot hardware. Next to it we will have another table where there will be a 5 year old child. In parallel we will try 100 different manipulation tasks that a neutral person has chosen- we could start with "pick up anything" - only household objects (e.g. as might be found in a typical American home) will be used, and we compare the performance of your robot with that of the 5 year old. Can you pick up a coin? Or a book? Or untwist a bottle top? Or insert any plug into a matching socket? Rotate one face of a Rubik's cube? Until your robot can do all the "open world manipulation" that a 5 year old kid can, some humility is in order.
Garry Tan
Garry Tan @garrytan
More of this please

Actually if you believe computer aided design and engineering is about to become 100x more awesome thanks to AI, it might just happen now

Tansu Yegen: Former BYD and Huawei engineers made a device that turns any bike electric, reaching 32 km/h ⚡

Garry Tan
Garry Tan @garrytan
Retweeted
Eric Levine Eric Levine
I dropped @garrytan's gbrain Twilio skill into my OpenClaw, and it one-shotted it. In just a few minutes, it literally called my phone and started talking to me. Holy shit.
Garry Tan
Garry Tan @garrytan
Matt Mahan as California governor would be an *incredible* whitepill

If we want to save California, this is the one way it could happen

Benjamin Freeman: San José Mayor Matt Mahan has seen his odds to become the next California Governor more than double in the last 48 hours from 6% to 14%.

He’s in second place behind Tom Steyer (53%)


Garry Tan
Garry Tan @garrytan
Retweeted
Sharran Srivatsaa Sharran Srivatsaa
As close to a first principles understanding as you can get.
Garry Tan: http://x.com/i/article/2042922188924424198
Garry Tan
Garry Tan @garrytan
Retweeted
Kpaxs Kpaxs
Reality actually is more negotiable than we think. Most systems have cracks in them. Most rules have exceptions. Most “no’s” are really “no, not that way” or “no, unless…”
But you only find the cracks if you’re looking for them. And you only look for them if you believe they exist.
I call it the “Angle-Seeking Hypothesis”.
It’s the belief that solutions exist generates the cognitive effort required to find them. It’s self-fulfilling, but in the generative direction.
Kpaxs: High-agency people seem to have insane luck. They don't. They just tried 47 things while everyone else tried two and gave up. The conviction that reality is negotiable is generative, it makes you creative. Because if you believe there's always another angle, you start looking for
Garry Tan
Garry Tan @garrytan
Retweeted
Aaron Levie Aaron Levie
This chart puts the datacenter demands into perspective very clearly. Amazon has done more capex in the last 3 years than its entire history.
Right now most AI adoption is on chat tools that are relatively token efficient. Comparatively, coding agents, use orders of magnitude more tokens, but are only used by a small population today.
These same kind of consumption patterns are about to come for the rest of knowledge work as well. The demand to process tokens for agents is probably 100s of times greater than we’re realizing right now.
Expect these charts to keep going vertical.
Hedgeye: Amazon has spent more building out its business over the past 3 years than in the previous 26 combined.
swyx
swyx @swyx
late night low TAM tweet:

did a Broadway solo song for the first time in ~18 years! only fumbled lyrics once!


swyx 🇬🇧: flew straight back from London town to get ready for my Broadway cabaret tonight !!

(sorry didnt advertise… its v limited seating)




Garry Tan
Garry Tan @garrytan
Retweeted
Graeme Graeme
The LLM Cheat-Sheet for Hermes + OpenClaw Agents (04.12.26)
The community has flagged Claude Opus 4.6 underperforming lately while GLM 5.1 has exploded on the scene to claim frontier capabilities.
A lot has changed since the last version. Here's what moved:
GLM-5.1 just proved its frontier capabilities with #1 SWE-Pro globally, 8-hour autonomous execution, and cheaper than Opus on input. It earns a Tier 1 spot.
Grok 4.20 enters Tier 2 with the lowest hallucination rate of any tested model, a native multi-agent API running up to 16 parallel agents, and a 2M context window.
Gemini 3.1 Pro drops to Tier 3. The price and multimodal story is strong, but the new frontier bar left it behind on reasoning.
Mistral Small 4 joins Tier 3. One model replacing three specialist pipelines (reasoning, vision, agentic coding) at $0.15/M input. Apache 2.0.
Here's the full landscape: 18 models in 4 tiers.
Tier 1 - Frontier Models
- Claude Opus 4.6: #1 agentic terminal coding; watch for inconsistency reports
- GPT-5.4: superhuman computer use, real planning. and introduced a $100/month plan
- GLM-5.1: #1 SWE-Pro globally, 8-hour autonomous execution, MIT license
Tier 2 - Execution
- MiniMax M2.7: 97% skill adherence, built for agents. API only, not open weights
- Kimi K2.5: long-horizon stability, agent swarm
- Grok 4.20: lowest hallucination rate on the market, native multi-agent, 2M context
- DeepSeek V3.2: frontier reasoning at 1/50th the cost
Tier 3 - Balanced
- Claude Sonnet 4.6: 98% of Opus at 1/5 the cost
- GPT-5.4 mini: 93.4% tool-call reliability, runs on OAuth
- Gemini 3.1 Pro: best multimodal value, native video+audio in one call
- Qwen3.6 Plus: near-frontier coding, completely free via OpenRouter
- Llama 4 Maverick: open-weight, self-host at zero marginal cost
- Mistral Small 4: one model replacing three; reasoning, vision, agentic coding, Apache 2.0
Tier 4 - Local / $0 - Runs on 32GB RAM or less
- Qwen3.5-9B: always-on subconscious loop, 16GB RAM, beats models 13x its size
- Qwen3.5-27B: stronger instruction following, 32GB RAM
- Gemma 4 31B: best local reasoning, Apache 2.0, commercial-ready
- DeepSeek R1 distill: best chain-of-thought at $0
- GLM-4.5-Air: purpose-built for agent tool use and web browsing, not a trimmed general model
Full breakdown with benchmarks, costs, and use cases in the table ↓
Graeme: The LLM Cheat-Sheet for OpenClaw and Hermes agents
The goal is to choose the right models that best fit your agents' needs for as little cost as possible.
Do this and you can build a proficient agent that will never die.
Here's the full landscape on popular models for AI
Garry Tan
Garry Tan @garrytan
GBrain 0.9.0 just dropped. Just ask your Claw/Hermes to upgrade if you're already on.

Every change I make to my Claw, you get too. And your Claw works with you to pick up the things you want, and custom configure my concepts to what you need.

It's the dawn of Personal AI Software, all just-in-time, made by the AI just for you and your needs.
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Retweeted
Riley Brown Riley Brown
Personally I don’t see the point of Cowork. I don’t even see the point of chatgpt or Claude anymore.
I either want to talk to a claw (agent with all my skills running on a persistent computer with NO guardrails in iMessage or telegram) or i want to use Claude Code or Codex app for coding.
Riley Brown: Codex App > Claude Desktop App
Yann LeCun
Yann LeCun @ylecun
Retweeted
Gandalv Gandalv
JD Vance has returned from Pakistan. With him came two estate agents. This is not the opening line of a joke, although I wish it were, because the punchline is catastrophic.
They flew home with their tails between their legs after what the State Department will presumably describe as “productive preliminary discussions” and what the rest of the planet will recognise as being thrown out on your ear. Iran wanted, apparently, to watch a vice president and two men who normally sell semi-detached houses in Virginia try to look dignified while boarding a plane back to nowhere.
And we all know what this means. The Strait of Hormuz will stay closed.
Twenty percent of the world’s oil supply passes through a waterway roughly the width of a slightly ambitious motorway. Every time someone in Washington has a bad week, that waterway gets a bit narrower. Right now it is approximately the width of a letter box, and oil prices are doing what oil prices always do when adults are not in charge, which is going completely and utterly insane.
Now. The estate agents. This is the detail that keeps me awake.
Why does a peace delegation to one of the most strategically sensitive negotiations on earth require property professionals? I have been racking my brain. The only explanation that holds together, if you squint and tilt your head, is that someone was attempting to acquire a stake in some sort of Hormuz-adjacent enterprise. A port authority, perhaps. A logistics company. Something with the word “international” in the name and a chairman who wears a lot of gold. Trump getting into the ownership structure of the very chokepoint his military is supposedly fighting over would be, in any previous administration, the kind of thing that ended careers. In this one it is Tuesday.
Iran, to their considerable credit, apparently noticed.
Meanwhile, back home, the administration is talking about conscription. Actual conscription. The mandatory enrollment of young Americans into military service, which is either a sign of profound strategic confidence or the most terrifying admission of overextension in modern American history. I leave it to you to decide which, though I note that the people making this decision have children of precisely the wrong age and security details of precisely the right size.
Seven billion people looked at Donald Trump and understood immediately what they were looking at. They had read the biography. They had watched the first term. They had seen a man attempt to remain in power by means that, in any country without America’s particular combination of luck and institutional stubbornness, would have succeeded. They filed this information away under “obvious.”
Seventy million people looked at the same evidence and saw a genius.
This is the central mystery of our age and I do not have a satisfying answer to it.
What I do have is the image of a speeding car. It is travelling at a hundred and thirty kilometres per hour toward a cliff edge that is clearly visible from quite some distance. The people responsible for steering it are busy. They have calls to make, deals to structure, properties to value. The passengers are arguing about whether the driver is brilliant or misunderstood. Europe and Canada are standing at the side of the road watching Americans jog past with suitcases, looking for somewhere quieter to live.
The cliff, for its part, is not moving.
Stay connected,
Follow Gandalv @Microinteracti1
Yann LeCun
Yann LeCun @ylecun
Retweeted
Michał Podlewski Michał Podlewski
Terence Tao proposes what he calls a "Copernican view of intelligence".
Instead of buying into the common, one-dimensional narrative that artificial intelligence will simply evolve from "subhuman" to "superhuman" and ultimately make humanity entirely redundant, Tao urges us to look at the bigger picture.
Much like the Copernican revolution proved the Earth is not the center of the universe, Tao suggests we need to realize that human intelligence isn't the only, or necessarily the highest, form of intellect. Historically, we have treated other forms of storing or creating knowledge—like animals, books, and computers—as secondary. However, we actually exist within a much richer universe of intelligence.
Both human intelligence and computer intelligence possess their own distinct strengths and weaknesses. The true potential lies not in viewing them as direct competitors, but rather in focusing on collaboration. By working together, humans and computers can achieve additional things that neither could accomplish on their own, requiring us to think in much wider terms than just what humans or computers can do alone.
Garry Tan
Garry Tan @garrytan
Retweeted
Vox Vox
gbrain v0.9.0 added a lint command. cleans up the garbage your openclaw/hermes leaves in your notes.
Of course! Here is... preambles, code fence wrappers, YYYY-MM-DD placeholders, broken citations. they pile up. you stop noticing.
→ scan, zero LLM calls, pure code
gbrain lint brain/
→ auto-fix what's fixable
gbrain lint brain/ --fix
→ not sure? preview first
gbrain lint brain/ --fix --dry-run
strips preambles, removes extra fences, flags placeholder dates and broken citations. no model calls, no guessing, deterministic.
0.9.0 also added gbrain publish: turns a brain page into self-contained HTML, strips private data automatically. optional AES-256 password, client-side decryption, no server needed.
gbrain publish brain/companies/acme.md --password
everyone using AI for knowledge management faces this pile eventually.
Garry Tan: GBrain 0.9.0 just dropped. Just ask your Claw/Hermes to upgrade if you're already on.
Every change I make to my Claw, you get too. And your Claw works with you to pick up the things you want, and custom configure my concepts to what you need.
It's the dawn of Personal AI
Yann LeCun
Yann LeCun @ylecun
Retweeted
Gilles Babinet Gilles Babinet
Lorsque vous avez le besoin de lever des dizaines de milliards de dollars, l'importance de créer un contexte de stupéfaction est déterminent. Les soi-disantes failles de sécurité détectées par Mythos, le nouveau modèle surpuissant de Anthropic, n'en étaient en fait pas.
jeffrey lee funk: We've been tricked, again. Many of the thousands of bugs and vulnerabilities Mythos found are in older software are impossible to exploit. And the severe zero-day reports rely on just 198 manual reviews https://www.tomshardware.com/tech-industry/artificial-intelligence/anthropics-claude-mythos-isnt-a-sentient-super-hacker-its-a-sales-pitch-claims-of-thousands-of-severe-zero-days-rely-on-just-198-manual-reviews
Yann LeCun
Yann LeCun @ylecun
Retweeted
James Tate James Tate
Too on point not to share, “Aussie reply to Trump rant about NATO not being there for us.
Mate. You run a country with 600,000 homeless people sleeping on the street tonight. A country where 40% of adults can't cover a $400 emergency without borrowing money. A country where insulin costs more than a car payment and people are rationing it to survive. A country where medical debt is the number 1 cause of bankruptcy. A country where women are dying in hospital car parks because doctors are too scared of abortion laws to treat a miscarriage.
You lock up more of your own citizens than any nation on earth. More than China. More than Russia. More than North Korea. The land of the free has 2 million people in cages, and a quarter of them haven't even been convicted of anything. They're just too poor to make bail.
Your life expectancy is going backwards. You're the only developed nation where that's happening. Your infant mortality rate is worse than Cuba's. Your kids do active shooter drills between maths and English while you sell the gunmaker's stock to your mates.
Your minimum wage hasn't moved in 15 years. You've got teachers working 2 jobs and veterans sleeping under bridges and you just spent a trillion dollars flattening a country that didn't attack you.
And you’ve got a convicted felon, adjudicating raping, paedophile protecting, porn star shagging insurrectionist running the biggest dumpster fire war campaign since the Taliban thanked you very much for losing again.
And you're calling Greenland poorly run?
Greenland has universal healthcare. Free education. One of the lowest incarceration rates in the world. Nobody goes bankrupt there because they got sick. Nobody dies in a waiting room because their insurance said no.
"NATO wasn't there when we needed them." When exactly was that, champ? September 11? Because NATO invoked Article 5 for the first and only time in history FOR YOU. Soldiers from dozens of countries deployed, fought, bled, and died in Afghanistan FOR YOU. Australia wasn't even in NATO and we still showed up. For 20 years.
And you pulled out at 2am without telling anyone and left them to deal with the mess.
So maybe before you start calling other countries poorly run, have a look at your own backyard, you spray-tanned aluminium siding salesman. The only thing poorly run in this picture is your fucking mouth. Credit (borrowed from) Jim Scroggins - original author 📷 unknown”
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
They grow up so fast 🦞

MV: Openclaw outgrew its creator, well done clankers

swyx
swyx @swyx
Retweeted
Alex Volkov Alex Volkov
Our world is changing.
I spent the last week listening to, chatting, dining, dancing with and interviewing the top AI Engineers in the world all gathered in London.
Here are the top 8 themes that emerged (a sneak from my upcoming blog covering the event in full 👇)
Garry Tan
Garry Tan @garrytan
Retweeted
Aaron Levie Aaron Levie
Security another great example of a job category that is about to have its Jevons paradox moment as well.
“And counterintuitively, I think better AI tooling for security will increase the demand for security talent, not decrease it. Autonomous exploitability automates the proving step, but it doesn't automate the response. More real findings surfaced faster means more triage, more remediation, more architectural decisions that need human judgment”
AI is going to generate 100X more code, and along with that, there will be an enormous increase in security discoveries. AI is the only way to triage all of these new threats and risks, but an expert still will be needed on the other side to manage the process. Going to be a massive category of opportunity for talent.
Tal Hoffman: http://x.com/i/article/2043293714190155776
swyx
swyx @swyx
Retweeted
Nick Taylor Nick Taylor
Mandatory wrap up video for @aiDotEngineer 👀
What a great conference. Great talks for sure, but the hallway track and speaker room were where the best convos were aside from meals out with peeps.
Looking forward to Miami!
Peter Yang
Peter Yang @petergyang
Retweeted
Peter Yang Peter Yang
"AI gets you to average quickly. Your job is to push past that."
Here's my new episode with @zoink (Figma CEO) where I asked him some tough questions, including:
→ Can you teach AI design taste?
→ Do design systems hurt creativity?
→ What's Figma's role when code is free?
Two-thirds of Figma users are now non-designers and it was super interesting to hear Dylan talk about Figma's future.
Some quotes from Dylan:
"Design is the new code. You'll be designing in a visual-first way before pushing a pull request right to production."
"If you're a PM and you think your job is to make docs and slide decks, you're going to love this new world. You get to make things too."
"If everyone agrees with your point of view, then you probably don't have one."
📌 Watch now: https://youtu.be/eqPljh_9C9Y
Thanks to our sponsors:
@Replit: Plan, design, and build with AI agents https://replit.com/?utm_source=creator&utm_medium=organic&utm_campaign=creator_program&utm_content=peteryang
@linear: The AI agent platform for modern teams http://linear.app/behind-the-craft
alliekmiller
alliekmiller @alliekmiller
I taught AI agents to 20,000 people for free, and for the next 12 hours, you can watch that exact same workshop replay.

It’s 2026. You need to know how to build AI agents. I got you.

If you’re a mom returning to work, someone worried about your job, or someone who feels behind on AI, this is 100% for you.

Free AI Agent Workshop replay: http://events.alliekmiller.com/recording
Garry Tan
Garry Tan @garrytan
Applied research at the app level happens in the open with open source now

Just in time software is a brave new world

λux: i was going through the hermes agent architecture and codebase and one thing that really stood out to me is that hermes is taking a much more explicit route to self-improvement than most agent systems usually imply. like it is not doing some offline trajectory mining where you

garrytan
garrytan @garrytan
Retweeted
Vaishnavi Vaishnavi
THE CEO OF Y COMBINATOR JUST OPEN-SOURCED HIS PERSONAL KNOWLEDGE BRAIN
Garry Tan has 342 markdown files scattered across repos, obsidian vaults, meeting notes
he built a tool to make them all searchable
it's called GBrain
import your files → it chunks, embeds, indexes everything → ask a question in plain english → finds answers by meaning, not keywords
"what are our biggest risks?" finds pages about competitive moats and board prep even if those words don't appear
the man who funded Airbnb, Stripe, and Coinbase doesn't use fancy tools
he uses postgres + pgvector + hybrid search on his own markdown files
now you can too
→ http://github.com/garrytan/gbrain
Garry Tan
Garry Tan @garrytan
Aggressive prediction: the CEOs like @jack and @tobi who are slinging code and open source and all the way at the edge are leading from the front. Their companies will make it

Others still in manager mode? Oof maybe less so

shirish: bro was right.

Atlassian down 75%. HubSpot down 69%. Figma down 86%.

Almost all of them down 30–70% from their 52-week highs.

AI is literally eating software alive and repricing every company in real time.

SaaS is cooked fr 😭

swyx
swyx @swyx
Retweeted
Scaling DevTools Podcast Scaling DevTools Podcast
When @nicknisi and @zackproser from @WorkOS ran their workshop at AIE, someone just got their Hermes bot to complete it and submit - and it was good
Garry Tan
Garry Tan @garrytan
We need to keep hiring young people

Soumitra Shukla: Very happy to share a new paper with Guido Friebel, Yao Huang, Jin Li, and Andrew Zhang on how AI could change the structure of internal labor markets.

We show that cutting junior hiring when AI arrives may weaken the pipeline that creates future seniors and lead to “lost

Garry Tan
Garry Tan @garrytan
Shut down the open air drug markets and jail the drug dealers in downtown San Francisco

Audit and investigate the nonprofit industrial complex and defund the ones engaging in fraud or actively helping people do drugs until they die

Fund recovery and compelled treatment

Vineet: San Francisco reports a violent crime every 51 minutes.

> Most of them are in just 3 neighbourhoods.
> Most of them are between 1 PM and 7 PM.
> Most of them never resolved.

I mapped all of it. Real-time. Filterable. Free.

Garry Tan
Garry Tan @garrytan
Retweeted
Michael Michael
Garry is kinda correct here, but is oversimplifying memory. Harrison (the author of the original article) makes a very good point but also makes memory sound easier than it is.
(before reading this article, note that I wrote down my thoughts and then passed it through Claude Code. I read every word. read it like like a coworker's Claude Code output)
Let me start with where Garry is right, because he IS right about something important.
Git-backed markdown is a memory format that is simultaneously human-readable, version-controlled, diffable, and greppable. No database gives you all four at once by default. If your agent's memory is an opaque blob in someone else's database, you have no idea what it "knows" about you. You can't correct it, can't diff it, can't even look at it.
That matters. A lot. I agree with this completely, and it's the right starting posture.
But it's a storage format. It's not a memory system. And the difference matters more than most people in this debate seem to realize.
Harrison's argument is different. He says memory is tied to the harness, the harness must be open, therefore you should use their open harness. The first two points are correct, the third is iffy because it assumes you want to be responsible for memory, which is hard (probably right but not a trivial decision). But that core insight--that the harness and memory can't be separated--is real and more important than people give it credit for.
Let me explain why, and then why everyone in this debate is still underselling the difficulty.
The harness owns the critical moment
The most important time for memory to be created or updated is during compaction.
Compaction is when the context window fills up and the agent compresses everything into a summary. Information that doesn't survive the summary is gone--not archived, gone. This is memory triage, and the harness controls it. Always.
OpenCode, OpenClaw, and Hermes all handle this. OpenClaw does it by default. OpenCode's SDK exposes compaction hooks--you can listen for session.compacting events and handle memory yourself. This is a great place for memory logic to live.
Now look at what Codex does: it produces an opaque, encrypted compaction summary that isn't usable outside the OpenAI ecosystem. Harrison himself flagged this in his article. This isn't just vendor lock-in, it's architectural lock-in by design.
Harrison is right to be alarmed by this. Garry is right that being "above the API line" matters. But neither one grapples with what actually makes memory hard once you've decided to own it.
Where files break down: forgetting
Garry's model inherits all the strengths of git: version history, diffs, blame, rollback.
But git's greatest strength is the core problem: nothing is ever truly forgotten.
When do you choose to forget a memory? How do you know it's outdated? You changed jobs six months ago--is the memory about your old team's coding standards still valid? Your codebase migrated from REST to GraphQL--are the API pattern memories stale, or still useful for legacy endpoints that still exist?
With files, you can delete them. But you need to know they exist AND that they're stale. And you need to check this proactively, because nobody is going to tell you.
This is actually a structured problem with real solutions starting to emerge. Zep's Graphiti engine uses what they call bi-temporal knowledge graphs--every fact gets timestamps for when the system recorded it AND when it was true in the real world. Facts are invalidated, not deleted. You can query "what did I know about X on March 15th" separately from "what is currently true about X."
Most memory providers are converging on some version of this. Supermemory has a graph-based system. Hydra is moving toward mixed graph/vector approaches. Mem0 added graph memory. This convergence is telling--it means the industry is collectively figuring out that flat files and pure vector search aren't enough for temporal reasoning.
Files don't have temporal validity windows. Git has history, but history and validity are different things. Knowing a file changed on March 15th doesn't tell you whether its contents are still true today.
Then there's the injection problem.
OpenClaw's memory.md is a trivial file with memories, injected into context every time, updated at compaction.
It's also fully observable because it's just.. a file. This was a genuine innovation and a really good idea.
But my OpenClaw installation clients keep running into the same wall: not all memory needs to be in context every time, and there's a ceiling on how much fits. Claude Code caps MEMORY.md at something like 200 lines. After that, the content just doesn't get loaded at session start. You lose it.
Most memory systems solve this with a reactive search_memories tool. The agent needs something, searches for it, finds it. Fine. But what happens when the agent doesn't know it should be searching?
A coding agent drifts off-track and violates a pattern your team agreed on three months ago. The memory exists. The agent didn't search for it because it didn't know it was relevant. There was no trigger. It just.. didn't know what it didn't know.
This is the proactive injection problem, and it's the hardest open question in memory right now.
There IS real research on this. MemGuide ranks candidate memories by something they call "marginal slot-completion gain"--basically asking "would injecting this memory fill a gap the agent actually needs right now?" PRIME takes a different angle, building proactive reasoning through iterative memory evolution. These are promising but none of them are production-ready for synchronous agents where you can't afford an extra inference round on every turn.
Mesa's Saguaro is interesting here. After every agent turn, it spawns a separate LLM that reviews what the agent just did against the full codebase. If the agent is drifting, it corrects course. They kinda built a memory system without calling it one. It's just really slow because you're doing LLM inference after every single turn.
Supermemory proved the logical extreme of this in their April Fools experiment: throw enough inference at the problem (eight parallel prompt variants, a dozen model calls per query) and you beat basically every memory benchmark. 98.6% accuracy. But the per-query cost is absurd. Their actual production system--the graph-based one--scores lower on benchmarks but is, you know, usable. For async agents where latency doesn't matter, brute force actually make sense. Not for Jarvis, my OpenClaw agent.
Where files break down: relationships and search
If you store everything as files, there's no way to search "all people I know" or "bugs I often make in this codebase" unless the agent happens to organize memories that way. And it won't, because agents are inconsistent organizers.
Concepts and relationships aren't flat. They're graphs. A person connects to a company, a project, a set of conversations. A coding pattern connects to a language, a framework, a set of past mistakes. Files can represent individual nodes but they can't represent the edges without becoming something else entirely.
So you solve it by adding structured search over your markdown files. Oops, you've built a database!
https://dx.tips/oops-database
@swyx wrote about this years ago: developers who avoid using a real database inevitably build one, badly, through incremental decisions. You start with files, add search, add indexing, add schemas, add conflict resolution, and suddenly you have Postgres except worse.
This is actually what happened with GBrain--Garry's own implementation of "memory is markdown, brain is a git repo." The files go in as markdown. But underneath? Postgres and pgvector for hybrid search. The markdown is the interface, the database is the engine. Even the strongest advocate for file-based memory needed a database to make it actually work.
The Composio model
Here's something I think is underexplored: portable memory across agents.
The same way Composio lets you move integrations across agents, some memory providers are moving toward letting you own and share memories across Claude, ChatGPT, OpenClaw, whatever. Your memories live in a vault you control, and each agent reads from and writes to it.
I'd call this the Composio model of memory. It's a good idea and more providers should pursue it.
But then you're potentially running two memory systems--one inside the harness (memory.md, CLAUDE.md, whatever the harness does at compaction) and one external. What a mess.
Hermes and OpenClaw both let the user choose their memory backend. Flexibility sounds great until you realize it means the system has to handle the possibility that memory is in two places at once, managed by two different things, with two different update cadences. I still think giving users this choice is the right call. But it is genuinely complicated.
The cost that nobody talks about
Every sophisticated memory system costs inference tokens. Letta's self-editing model--where the agent actively decides what to remember during reasoning via tool calls--is the most architecturally interesting approach I've seen. The agent curates its own memory as a first-class part of thinking. But every core_memory_replace call is tokens. Mesa's per-turn review is a whole extra LLM call. Supermemory's brute force approach is a dozen.
File-based memory is effectively free. Read a file, inject it, done.
The bar for beating memory.md isn't just "is it smarter?" It's "is it enough smarter to justify the cost?" And for most use cases today, the honest answer is no.
But here's something that should make people pay attention: recent benchmarks on agentic memory (AMA-Bench among others) are finding that the design of your memory system matters way more than which model you're running. We're talking maybe an order of magnitude more variance from architecture choices than from model scaling. The architecture matters enormously. It just also costs real money, and that tension is why most production systems still use the simple thing.
The unsolved problems
Recent research has started to identify what a memory system actually needs to do well:
Accurate retrieval--find the right memory when asked.
Learning in real time--update what you know from new information as it comes in.
Long-range understanding--connect things across sessions that happened weeks apart.
Selective forgetting--know when a memory is stale and stop using it.
No current system is good at all four. Graph-based systems handle forgetting and long-range connections better than anything else, which is probably why everyone is converging on them. Letta does well on retrieval and real-time learning. File-based systems do ok on retrieval and struggle with the rest.
Now add multi-agent coordination. Multiple agents on the same filesystem. Multiple people cooperating with agents on different projects. Who organizes the memory? Who resolves conflicts? Do we deploy an async agent to consolidate memories at compaction time? At session end? On a cron job overnight?
How do we prioritize recent memories over old ones? How much control should the agent have over its own memory? How do we handle that some people want aggressive memory and some want minimal? And they might want to export it and bring it to another agent!
These aren't rhetorical questions I'm asking to sound smart. I deal with these every week deploying agents for clients. Nobody has good answers.
Benchmarks exist now. They're just not reliable.
A year ago there were no memory benchmarks worth talking about. That's changed. LOCOMO, LongMemEval, AMA-Bench, MemoryAgentBench all exist. There's even an ICLR workshop this year dedicated to agent memory.
But here's the problem: evaluation choices that look like implementation details--the prompt you use for the judge model, the scoring methodology, the answer generation setup--can swing accuracy by double digits. Supermemory showed this directly when they demonstrated you could score 98.6% by letting any of eight prompt variants count as correct. That's not a benchmark result. That's a configuration choice dressed up as one.
So we have benchmarks. They're just not trustworthy enough to settle any debates. If you overcomplicate your memory system, you still can't be sure it's actually outperforming a memory.md other than vibes. Just vibes with numbers attached.
Nobody has memory right
Not Garry, not Harrison, not OpenClaw, not Letta, not Zep, not Supermemory, not Mem0. Nobody.
Garry's instinct--keep it simple, keep it readable, keep it yours--is the right starting posture. Harrison's instinct--the harness and memory are inseparable, own both of them--is architecturally correct. Sarah Wooders' framing--memory is context management, not a retrieval problem--is the most precise explanation of why this is so hard.
But memory.md is not the end state. It's the beginning.
It's the simplest thing that works, and for most use cases today it's the right choice. Not because it's good. Because everything else is either too expensive, too complex, too slow, or too unproven to justify the leap.
The gap will close. The research is real, the providers are converging on graphs, and the benchmarks are slowly forming.
But if anyone tells you they've solved memory, they haven't. They've solved one of the four problems and they're hoping you don't ask about the other three.
Garry Tan: If your memory dies when your harness dies, you built the harness too thick.
Memory is markdown. Skills are markdown. Brain is a git repo. The harness is a thin conductor — it reads the files, it doesn't own them.
Matt Shumer
Matt Shumer @mattshumer_
Retweeted
Sunday Briefing Sunday Briefing
Re @mattshumer_ , co-founder and CEO of OthersideAI, is warning about powerful new AI models like Claude Mythos, and what it could mean for national security if it falls into the wrong hands.
mattshumer_
mattshumer_ @mattshumer_
Retweeted
Sunday Briefing Sunday Briefing
Re @mattshumer_ , co-founder and CEO of OthersideAI, is warning about powerful new AI models like Claude Mythos, and what it could mean for national security if it falls into the wrong hands.
Garry Tan
Garry Tan @garrytan
We love the best frontier models but it’s worth noting there are a lot of new ones and they can also be good and useful too, especially when it comes to bringing the capabilities to more people

I think about the Eames’s design principle: the best, for the most, for the least

Adolfo 🦀🔺 | Open Crabs Father | truelens.tech®️: Again no one is talking about this but @nvidia is offering a bunch of completely free API endpoints.

They just dropped MiniMax M2.7 free.

I mean literally, completely FREE!!!!

And you can easily setup it as custom provider on @opencrabs

Grab your key here:
Garry Tan
Garry Tan @garrytan
Personal open source software is here

Matt Van Horn: v3 of @slashlast30days is here. 20,000+⭐ on GitHub. The biggest upgrade yet.

An AI agent-led search engine scored by upvotes, likes, and real money - not editors. Reddit comments, X posts, and YouTube transcripts are now FREE. No API keys needed for the core sources.

v3

Garry Tan
Garry Tan @garrytan
It’s official, Gemini Live 2.5 voice agent is the best

It’s smart, it’s fast, it has large enough context

Coming to GBrain Voice shortly
Yann LeCun
Yann LeCun @ylecun
Retweeted
Ananay Ananay
Marcus Hutchins, the guy famous for stopping the WannaCry Ransomware, probably has the best take on Mythos doing vulnerability research
Guillermo Rauch
Guillermo Rauch @rauchg
Protip: Create an X group chat with your best and most demanding customers. Put your engineering leads in said group chat. Respond and ship quickly. Face the music.

We have one going for @v0. DM if you want to be a part of it and are committed to sharing excellent feedback.
rauchg
rauchg @rauchg
Protip: Create an X group chat with your best and most demanding customers. Put your engineering leads in said group chat. Respond and ship quickly. Face the music.

We have one going for @v0. DM if you want to be a part of it and are committed to sharing excellent feedback.
swyx
swyx @swyx
Retweeted
Paul Iusztin Paul Iusztin
Remote work is lonely. 90% of the time is just you and a screen.
But when you get the chance to meet your close virtual friends for the first time, it hits differently.
Being in the same room with your people is probably the best reason to join conferences.
Such as the first edition of the @aiDotEngineer in London.
And we badly needed one focused just on hardcore AI Engineering in Europe!
Here is the LinkedIn corner I finally had the chance to meet in real life after 2 years of chatting through screens: @Whats_AI, @maximelabonne, @_LouiePeters , and @ThoBustos
I expected you guys to be a bit taller!
Regretting that I couldn't take a pic with the one and only @nicolaygerold
Amjad Masad
Amjad Masad @amasad
On its 50th birthday Apple is doing its best to become the most hated company in the world.

Ethan Levins 🇺🇸: Apple has removed Lebanese village names in Southern Lebanon.

As Israel invades, they are already setting the state to justify occupation.

I’ve never seen something like this.

Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Retweeted
Chris Chris
🚨 OPENAI'S "SPUD" IS ALREADY IN THE WILD (AND IT RIVALS MYTHOS)
Brad Gerstner just went on the All-In Podcast and confirmed that OpenAI's highly anticipated "Spud" model (expected to be GPT-5.5) is already being tested behind closed doors and the early reviews are insane.
Addressing the recent wave of skepticism surrounding the company, Gerstner claims we are currently at "peak OpenAI FUD," warning that: "It would be seriously foolish to count out OpenAI... it starts with great researchers and great models. And I think when you see the Spud model they're about ready to release, I think it's going to be an excellent model, shows that they're firmly on the wave."
When Jason Calacanis presses him on whether anyone has actually gotten their hands on the new model yet, Gerstner confirms the quiet rollout, stating that: "People are using Spud, right? So it is being previewed."
When asked exactly what those early testers are saying about its capabilities, Gerstner drops a massive comparison, revealing that: "They're telling us that it's an incredible model on par with Mythos, right? And that it's a very usable model in terms of how it's packaged."
Garry Tan
Garry Tan @garrytan
Retweeted
Dean W. Ball Dean W. Ball
I don’t see the problem with this. This is basically just a social-justice-inflected way of making a point Marc Andreesen has made for years about “reality privilege.”
w.e.b dubiracial: what do you even say
Garry Tan
Garry Tan @garrytan
Retweeted
Jessica Livingston Jessica Livingston
Fear comes the day someone threatens to kill you and your family. In Sam's case, it was an actual attempt. Unless you've experienced this yourself, it's hard to imagine how much it changes you.
Sam Altman: I wrote this early this morning and I wasn't sure if I would actually publish it, but here it is:
https://blog.samaltman.com/2279512
Yann LeCun
Yann LeCun @ylecun
Retweeted
Thierry Breton Thierry Breton
Make election interference great again, @JDVance.
swyx
swyx @swyx
Retweeted
Will Jones Will Jones
http://x.com/i/article/2043418647918481408
swyx
swyx @swyx
Retweeted
Will Jones Will Jones
Great to spend a few action-packed days with likely the collection of the most AGI-pilled people in the world at @aiDotEngineer London.
Some reflections:
Garry Tan
Garry Tan @garrytan
Retweeted
Jesse Arm Jesse Arm
It's definitely a long shot, but a Mahan win would be transformational.
Not just for California, but for the entire Democratic Party—and therefore the country.
Electing a sane, competent centrist technocrat to lead the biggest, bluest, most poorly managed, lefty special-interest-dominated state in America?
That would be the anti-Mamdani whitepill moment.
Benjamin Freeman: San José Mayor Matt Mahan has seen his odds to become the next California Governor more than double in the last 48 hours from 6% to 14%.
He’s in second place behind Tom Steyer (53%)
Amjad Masad
Amjad Masad @amasad
Retweeted
Garland Nixon Garland Nixon
BREAKING NEWS : White House insiders leak that Congressman Randy Fine has been barred from events at the President's Mar A Largo resort after "an unfortunate event in the buffet line" in which Fine allegedly assaulted a guest over "the last slice of pot roast."
Garry Tan
Garry Tan @garrytan
Retweeted
Aaron Levie Aaron Levie
Another week on the road meeting with a couple dozen IT and AI leaders from large enterprises across banking, media, retail, healthcare, consulting, tech, and sports, to discuss agents in the enterprise.
Some quick takeaways:
* Clear that we’re moving from chat era of AI to agents that use tools, process data, and start to execute real work in the enterprise. Complementing this, enterprises are often evolving from “let a thousand flowers bloom” approach to adoption to targeted automation efforts applied to specific areas of work and workflow.
* Change management still will remain one of the biggest topics for enterprises. Most workflows aren’t setup to just drop agents directly in, and enterprises will need a ton of help to drive these efforts (both internally and from partners). One company has a head of AI in every business unit that roles up to a central team, just to keep all the functions coordinated.
* Tokenmaxxing! Most companies operate with very strict OpEx budgets get locked in for the year ahead, so they’re going through very real trade-off discussions right now on how to budget for tokens. One company recently had an idea for a “shark tank” style way of pitching for compute budget. Others are trying to figure out how to ration compute to the best use-cases internally through some hierarchy of needs (my words not theirs).
* Fixing fragmented and legacy systems remain a huge priority right now. Most enterprises are dealing with decades of either on-prem systems or systems they moved to the cloud but that still haven’t been modernized in any meaningful way. This means agents can’t easily tap into these data sources in a unified way yet, so companies are focused on how they modernize these.
* Most companies are *not* talking about replacing jobs due to agents. The major use-cases for agents are things that the company wasn’t able to do before or couldn’t prioritize. Software upgrades, automating back office processes that were constraining other workflows, processing large amounts of documents to get new business or client insights, and so on. More emphasis on ways to make money vs. cut costs.
* Headless software dominated my conversations. Enterprises need to be able to ensure all of their software works across any set of agents they choose. They will kick out vendors that don’t make this technically or economically easy.
* Clear sense that it can be hard to standardize on anything right now given how fast things are moving. Blessing and a curse of the innovation curve right now - no one wants to get stuck in a paradigm that locks them into the wrong architecture. One other result of this is that companies realize they’re in a multi-agent world, which means that interoperability becomes paramount across systems.
* Unanimous sense that everyone is working more than ever before. AI is not causing anyone to do less work right now, and similar to Silicon Valley people feel their teams are the busiest they’ve ever been.
One final meta observation not called out explicitly. It seems that despite Silicon Valley’s sense that AI has made hard things easy, the most powerful ways to use agents is more “technical” than prior eras of software. Skills, MCP, CLIs, etc. may be simple concepts for tech, but in the real world these are all esoteric concepts that will require technical people to help bring to life in the enterprise.
This both means diffusion will take real work and time, but also everyone’s estimation of engineering jobs is totally off. Engineers may not be “writing” software, but they will certainly be the ones to setup and operate the systems that actually automate most work in the enterprise.
swyx
swyx @swyx
Retweeted
Armin Ronacher ⇌ Armin Ronacher ⇌
I put the slides from our @aiDotEngineer talk up.
Garry Tan
Garry Tan @garrytan
Rats spread leptospirosis — a tropical disease — through Berkeley's Harrison encampment . Courts blocked cleanup. This is what "protecting" unhoused people looks like.

https://gli.st/ed16vtzx
Garry Tan
Garry Tan @garrytan
Coming to GBrain Voice probably by tonight

Fully working Gemini Live voice (I think this is the best I have ever seen) with all these skills

Now why ask: Why can’t Amazon and Apple can’t ship this in Siri and Alexa? I have no idea.

I made this by the pool in Hawaii

Garry Tan
Garry Tan @garrytan
Re Install OpenClaw or Hermes from these instructions and get it on WebRTC or on your Twilio number in less than half an hour

It’s Homebrew computer club but for personal AI and I’m inviting you to my garage in Mountain View. I mean my GitHub account

https://github.com/garrytan/gbrain
Garry Tan
Garry Tan @garrytan
Retweeted
Seeking Gradient Seeking Gradient
Over the last couple weeks, I felt like I keep seeing @GarryTan's posts about GStack and GBrain. Finally carved out time to go deep on it.
Garry has been prolific and outspoken about how coding agents can unlock entirely new orders of magnitude of productivity.
That's the kind of talk I love because I agree completely. And when someone with his experience is that excited about something, it's probably worth figuring out why.
GBrain is an intelligence layer on top of all the documents and artifacts produced in day-to-day life. Point it at any content, and it maintains an up-to-date, queryable repository of information. It's also becoming a platform with a growing number of integrations that Garry (or his agents) are actively expanding.
Garry Tan
Garry Tan @garrytan
Retweeted
roon roon
maybe the media shouldn’t consistently include full identifying addresses for some reason when reporting on attacks
Peter Yang
Peter Yang @petergyang
My entire feed and the Claude subreddit is full of ppl saying opus got nerfed.

Why would Anthropic nerf its own models?
petergyang
petergyang @petergyang
My entire feed and the Claude subreddit is full of ppl saying opus got nerfed.

Why would Anthropic nerf its own models?
Garry Tan
Garry Tan @garrytan
Just because OpenClaw doesn't write tests on its own when it's creating software for you doesn't mean you shouldn't make it make tests for you...
Garry Tan
Garry Tan @garrytan
Retweeted
Perry E. Metzger Perry E. Metzger
You tell people often enough that AI researchers and executives are evil and their work will kill everyone they love, and eventually, some of them will listen. Any intelligent person could have predicted that.
“paula”: i didn’t realize how bad it was until i saw this comment section on instagram

YouTube

0

No recent videos fetched on this date.