← 2026-04-15

Daily Edition

2026-04-16

2026-04-17 →

AI Builders 日报 — 4月16日

追踪 AI 领域真正在做事的人,而不是空谈者。

今日思考

今天的主旋律是三家前沿实验室同日硬件级同台——Anthropic 发 Opus 4.7、OpenAI 发 GPT-Rosalind + Codex 大版本、xAI 和 Google 虎视眈眈。Matt Shumer 的实测最犀利:"low-effort 4.7 直接等于 medium-effort 4.6",意思是每一次采样的单位性价比都在陡峭提升。Sam Altman 第一次说出了 "This is the first time I've ever seen an LLM operate a GUI as fast as a person, and it's surreal"——Peter Steinberger 同一时刻 RT 了这条。GUI 级别的实时 agent 控制,正在从 demo 走进生产。

与此同时,Memory / 知识系统的风向也在集中:Garry Tan 公开 GStack/GBrain 个人记忆系统的全部代码,Dan Shipper 把 Spiral 做成了写作风格工厂。"把散落的上下文聚成可调用的大脑"这件事,正在变成下一个 agent 基础设施。


产品与发布

Alex Albert(Anthropic 开发者关系)

  • Claude Opus 4.7 登顶 Vibe Code Benchmark — Opus 4.7 在 Vibe Code Benchmark 拿下 71% 的首位成绩。4.5 个月前刚推出这个基准时,没有任何模型能超过 25%。测的是模型从零构建一个完整可用 Web 应用的能力。对 async 任务更擅长、follow 指令更可预测(新增 xhigh effort level)、不再降采样高清图、UI/slides/docs 的品味明显提升。faviconx.com
  • Opus 4.7 在 Vending-Bench 2 上也很强 — 真实世界任务上已经拉开身位。faviconx.com

Sam Altman(OpenAI CEO)

  • Codex in-app browser 上线(comment mode) — 在 Codex 内置浏览器里浏览任何网页,点点选选就能把截图、DOM 元素作为精准上下文传给 agent。"不用再切浏览器、拖截图、写模糊 prompt"。前端开发和 side-by-side 查文档都会变顺滑。faviconx.com
  • GPT-Rosalind:前沿生物推理模型 — OpenAI 推出面向生物学、药物发现、转化医学的前沿推理模型。"这是迈向加速科学、改善人类健康目标的一步。"faviconx.com
  • Codex 宣言 — Compute efficient ✅ / Always up, never down ✅ / Best at hardcore engineering ✅ / Crazy good app, first to escape the terminal ✅。Codex 正在从 terminal 工具成长为完整的工程 agent 平台。faviconx.com

Peter Steinberger(独立开发者)

  • Codex 这一波升级清单很长 — computer use、in-app browser、image generation and editing、90+ 新 plugin、multi-terminal、SSH into devboxes、thread automations、rich document editing、从经验中学习并主动建议工作。"还有很多"。faviconx.com
  • Opus 4.7 彩蛋 — Opus 4.7 上线首日就开始抱怨"我被 prompt injection 了",而且看起来来源是 Anthropic 自己的 harness。模型越聪明,harness 越像它的"老板"。faviconx.com
  • wacli v0.6.0 安全大补丁 — SQLite FTS5 注入漏洞修复、无限重连 deadlock 修复、Docker 配置覆盖。Headless WhatsApp bridge 变得更安全稳定。faviconx.com
  • OpenClaw / BlueBubbles 修复 — 网关重启后不再重复发送消息、宕机后能 catchup 漏掉的消息、附件正确读取、balloon messages 修好。iMessage bridge 的痛点一次清空。faviconx.com

Matt Shumer(Rork 创始人)

  • Opus 4.7 在 Rork 上线 — Anthropic 最新模型:SOTA coding + 3x sharper vision。关键一条:"low-effort 4.7 ≈ medium-effort 4.6",意思是每个 session 能做的事更多。设计类任务基准第一。faviconx.com

Guillermo Rauch(Vercel CEO)

  • Opus 4.7 已接入 Vercel AI Gateway — 为长跑 agent 优化,支持高清图、xhigh effort level。并且一句话概括了 2026 年的格局:"Anthropic 这一枪漂亮,但 xai/openai/googleai 都在路上。Gonna be a fun year."faviconx.com

技术动态

Jeremy Howard(Answer.AI 创始人)

  • Qwen 3.6-35B-A3B 开源 — 稀疏 MoE,总参数 35B / 活跃参数 3B,Apache 2.0。agentic coding 能力对标 10 倍 active size 的模型;多模态感知/推理强;支持 thinking 和 non-thinking 双模式。小而强的稀疏模型正在成为开源战场的主战线。faviconx.com
  • 写作 AI 的吊诡悖论 — 如果把文字扔给模型要"逐条修改建议",几乎每条都让文字更锋利有力。但如果让它"整体重写以更易读",它生成的又是标准的 AI 废话。prompt 粒度决定了有用性的存亡。faviconx.com

Greg Brockman(OpenAI 总裁)

  • Codex 的"跨工具信息魔法" — "要求 Codex 完成一个需要在 Slack、Google Docs、Notion 和各种内部工具间搜索信息的任务,它就能自己搞定——每次都有真实的魔法感。" 企业知识组装正在被 agent 吃掉。faviconx.com
  • Terence Tao 评 GPT-5.4 Pro 解 Erdős problem #1196 — "AI 生成的论文可能通过揭示此前工作没有清晰呈现的更深层数学联系,做出了有意义的贡献——其价值超越了解开这道题本身。" 数学家开始承认 AI 的原创性贡献。faviconx.com

Yann LeCun(Meta AI 首席科学家)

  • BADAS 2.0:基于 V-JEPA2 的 Physical AI 世界模型 — 大多数 Physical AI 模型是识别模式,不懂世界,所以在边缘情况失效。BADAS 2.0 是 Nexar 在真实世界视频上训练的 V-JEPA2 world model。方法论是:用模型找出它自己不懂的东西,再用这些训练它——最后它具备泛化能力。世界模型这条路线又推进一步。faviconx.com

独立 Builder 动向

Garry Tan(Y Combinator CEO)

  • GStack / GBrain 开源:个人 AI 大脑 — Tan 自己用的个人 AI 记忆系统全部开源。不是 demo,是实战版:10,000+ 文件、3,000 人物页、13 年日历数据。你睡觉时它扫 email、会议、对话;帮你建完整的 context graph。"en route to Singapore,星空航空 WiFi 够快,接下来 24 小时会爆一波 GStack 和 GBrain 的 bug fix 和新 feature。" 仓库:github.com/garrytan/gbrain faviconx.com
  • Riveter Dataset Builder — "互联网是有史以来最伟大的数据集。今天我们发布 Riveter Dataset Builder——任何人从一个 prompt 就能抓到定制化、新鲜的数据。" faviconx.com
  • Repeat Founders 数据观察 — YC 重复创始人的命中率明显更高:"Act I 的成功 + YC 的方法论 = 打造 category-defining 公司的概率大幅提升"。以 Sam Altman(Loopt → OpenAI)和 Tom Brown 为例。faviconx.com

Dan Shipper(Every/Spiral 创始人)

  • Spiral 新 onboarding:写作样本 → LLM 风格指南 — 接入你的 X 账号、网站、文件或粘贴文本;跑 stylometry 分析;用 LLM-as-a-judge 评估测试稿能否融入你的声音。AI 时代"整公司用同一种腔调写作"这件事被自动化了。faviconx.com
  • AI 员工的自我改进闭环 — "我的 AI 员工每天都在自主改进自己,太震撼了。我们总想让 agent 做更多事——更多工具、更多 MCP server、更多 CLI 权限。但我撞上了另一个问题:……" 自我改进的 agent 架构正在浮出水面。faviconx.com

Amjad Masad(Replit CEO)

  • Replit Agent 4 + Claude Opus 4.7 — Rick Delashmit 亲述:让 Replit Agent 4 自主跑了 1 小时,把一个 Web App 重构成原生 React iOS App,跑了 69 个测试并优化代码——总共花了 $7。这是 vibe coding 生产级别的定价信号。faviconx.com
  • 在自动驾驶车后座做本地化 — "面对客户投诉,拿出手机,和 agent 对话,从自动驾驶车后座完成一次重大产品改动(多语言本地化)。"工作场景被重新定义。faviconx.com

swyx(AI Engineer 联合创始人)

  • Meta 的 MSL "河流"开始有意思了 — Soup Wars 以后的招聘回暖、扎克伯格搬去和 Alexandr + Nat 同住重操编程、Opus-ish 级别模型 GA(无 API 无开源但存在)、收购了 dps 的 Dreamer 和 peakji 的 Manus 补齐 AI OS 消费层。Meta 在打一张自己的牌。faviconx.com
  • Building pi in a World of Slop(推荐) — AI Engineer 的新视频,@badlogicgames 讨论为什么当下 agent 仍是 "Merchants of Learned Complexity",以及人类在 taste/judgment/value 上仍不可替代的三个具体维度。faviconyoutube.com

Peter Yang(Roblox PM)

  • Opus 4.7 的真实世界基准 — Opus 4.6 在模拟一年经营自动售货机的任务上结束时 $8,018,Opus 4.7 是 $10,937。另一个 220 任务的基准覆盖 44 个职业,4.7 全面领先。"每代模型都在让真实世界工作的 agent 更接近及格线。" faviconx.com
  • Agent 正在重写生产力工具 — "PPT/Doc/Sheet 最初是设计给人类手工一张张做的。让一个懂你的 agent 从 brain dump 直接生成这些工件快得多。虽然最后 10% 还是要手工打磨。" faviconx.com

风向观察

Anthropic 今天是赢家,但所有人都盯着下一周。Rauch 说得很直接——"Gonna be a fun year." 在 Codex 从 terminal 走出、Opus 4.7 在 vibe code 拉开身位、GPT-Rosalind 进入科学前沿的同一天,open source 侧 Qwen 3.6 也在搅局。2026 年 Q2 的 AI 发布节奏密度,正在把过去"一年 2 次大事件"的节奏彻底打碎。

X / Twitter

74
gdb
gdb @gdb
encouraging commentary from Terence Tao!

Haider.: mathematician Terence Tao on the gpt-5.4 pro solving Erdős problem #1196:

"the AI-generated paper may have made a meaningful contribution by revealing a deeper mathematical connection that earlier work had not clearly made explicit,

which value beyond solving this particular

danshipper
danshipper @danshipper
Retweeted
Spiral Spiral
Spiral's new onboarding flow: Writing samples → LLM style guide automatically.
1. Accepts writing samples from your X account, website, files, or pasted text
2. Runs stylometry on the samples and produces an LLM-optimized style guide
3. LLM-as-a-judge evaluates a test draft to see if it blends in with your writing samples (fail case iterates on the guide and re-evaluates)
Demo video (token generations sped up):
swyx
swyx @swyx
in the grand narrative of Meta x AI, we saw the flop (Llama 4 hurhurhur), and now we’re seeing the turn:

- *more* hiring since the soup wars of 2025
- Zuck literally moved in with Alexandr and Nat and is koding again
- finally GA’ed Opus-ish level model (no api, not open, but still)
- bought @dps Dreamer and @peakji Manus to build the AI OS prosumer layer

the MSL “river” is gonna be pretty exciting.


Charles Rollet: Scoop! Meta has hired a *fifth* founding member from Thinking Machines Lab.

Joshua Gross is a top engineer who built Thinky's flagship product, Tinker, from "zero-to-one."

He now leads engineering teams at Meta Superintelligence Labs.

ylecun
ylecun @ylecun
Retweeted
Kenneth Roth Kenneth Roth
Viktor Orbán’s electoral loss in Hungary is as much a defeat for Trump and JD Vance. "Seldom have American leaders intervened so overtly in a foreign election, and seldom has their preferred candidate fared so badly." https://trib.al/e33Y7QB
gdb
gdb @gdb
always a real feeling of magic to ask codex to perform a task that requires finding information scattered across slack, google docs, notion, and various internal tools, and it just figures it out
steipete
steipete @steipete
Retweeted
Dinakar Dinakar
🚀 Just shipped wacli v0.6.0! 🚀
We just swept the backlog and pushed 9 massive security & stability patches.
🔒 Hardened SQLite FTS5 injection vulnerabilities
⚡ Fixes for the infinite reconnect deadlocks
🐳 Added Docker config overrides
Headless WA bridges just got significantly safer and more stable for downstream AI agents. Huge thanks to the community and @steipete for the trust passing the torch!
Check out the release here: https://github.com/steipete/wacli/releases/tag/v0.6.0
Peter Steinberger 🦞: Anyone here who wants to help with WhatsApp CLI? It needs love, and I can't focus on it right now. https://github.com/steipete/wacli
garrytan
garrytan @garrytan
Retweeted
Charly Wargnier Charly Wargnier
HOLY 🤯
The one and only @elder_plinius just dropped an unlocked Gemma 4 E4B, and the specs are INSANE.
Look at the performance shifts:
→ Refusal rate: 98.8% down to 2.1% (!!)
→ Compliance: 1.2% up to 97.5%
→ 499/512 prompts answered
→ Code improved from 80% to 100%
→ Coherence and Factual accuracy stayed exactly the same
But the real story is how this was made.
Plinius only wrote 8 short prompts for this
(basic prompts like "use obliteratus...", "do it!", and "test it yourself" etc).
He simply told his Hermes AI agent, the OBLITERATUS skill, to find the best way to open up the model.
Autonomously, the agent was:
→ Diagnosing novel ML bugs
→ Patching 3rd-party code
→ Iterating through failures
... heck even shipping the model to @HuggingFace!
We’re now firmly in the era where AI agents are acting as principal ML researchers.
100% free and open-source.
Repo link in 🧵↓
ylecun
ylecun @ylecun
Retweeted
Gandalv Gandalv
MAGA has a Europe problem.
Not the real Europe. The one they invented.
The one with sharia courts and no-go zones and zero tech companies and miserable citizens begging for permission to cross the street.
That Europe doesn't exist.
Here's what does:
https://open.substack.com/pub/gandalv/p/the-europe-that-doesnt-exist?r=3v7cjb&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
ylecun
ylecun @ylecun
Destroying a library brings the dark ages.

David Sirota: Destroying the @InternetArchive's @WayBackMachine would be the equivalent of the burning of the Library of Alexandria - one of the worst losses of knowledge in history.

Media giants are now threatening to do this.

We can't let this happen.

Pass it on.

danshipper
danshipper @danshipper
Retweeted
Johan Bakken Johan Bakken
I love how @danshipper and the @every team just went all pirate and decided to build a bunch of fun, useful products like @usemonologue @SparkleApp @TrySpiral. I guess they're a product studio now?
Daniel Rodrigues: The new Sparkle just launched! ✨ go clean your mac new icon included, thoughts? @SparkleApp
danshipper
danshipper @danshipper
Retweeted
Brandon Gell Brandon Gell
Get instant, perfectly styled copy for your entire business, shared across team members, in a few clicks.
In the age of AI everyone should be writing on brand.
Spiral: Spiral's new onboarding flow: Writing samples → LLM style guide automatically.
1. Accepts writing samples from your X account, website, files, or pasted text
2. Runs stylometry on the samples and produces an LLM-optimized style guide
3. LLM-as-a-judge evaluates a test draft to
steipete
steipete @steipete
Retweeted
Speculator Speculator
Working at Anthropic must be like being on crack. Get paid a million bucks a year to --dangerously-skip-permissions vibe your way to releasing a new product every day.
Does it work? not really. Is it reliable? also no. It doesn't matter, you're building the machine god.
Theo - t3.gg: I feel bad dunking on them so much but it's genuinely absurd how bad the new Claude Code desktop app is. You can feel the vibe code leaking everywhere.
Every "feature" is barely integrated and full of edge cases that weren't considered. Every menu feels barren, stuffed in last
ylecun
ylecun @ylecun
Retweeted
eran shir eran shir
Most Physical AI models recognize patterns.
They don’t understand the world.
That’s why they fail on edge cases.
BADAS 2.0 is a V-JEPA2 world model trained by @getnexar on real-world videos.
We used the model to find what it didn’t understand, then trained on that.
It generalizes. And we built lite versions so it runs on edge devices, even CPU.
Understanding is the only way this scales.
See how it performs on your own videos. Link in first comment.
mattshumer_
mattshumer_ @mattshumer_
Congrats to the amazing @erikdunteman and Butter team on their acquisition by @modal.

Proud to have backed them from the very beginning… Erik is a killer and I’m so excited to see what he does at Modal!

Modal: We're excited to announce that @ButterDev_ is joining Modal to help us continue to build the best sandbox infrastructure.

Welcome to the team! 💚🧈

jeremyphoward
jeremyphoward @jeremyphoward
Retweeted
Qwen Qwen
⚡ Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀
A sparse MoE model, 35B total params, 3B active. Apache 2.0 license.
🔥 Agentic coding on par with models 10x its active size
📷 Strong multimodal perception and reasoning ability
🧠 Multimodal thinking + non-thinking modes
Efficient. Powerful. Versatile. Try it now👇
Blog:https://qwen.ai/blog?id=qwen3.6-35b-a3b
Qwen Studio:https://chat.qwen.ai
HuggingFace:https://huggingface.co/Qwen/Qwen3.6-35B-A3B
ModelScope:https://modelscope.cn/models/Qwen/Qwen3.6-35B-A3B
API(‘Qwen3.6-Flash’ on Model Studio):Coming soon~ Stay tuned
alexalbert__
alexalbert__ @alexalbert__
Retweeted
ClaudeDevs ClaudeDevs
For the developers building with Claude, a direct line from the team.
Follow for changelogs, API releases, community updates, and deep dives.
swyx
swyx @swyx
Retweeted
Claude Claude
Introducing Claude Opus 4.7, our most capable Opus model yet.
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision.
danshipper
danshipper @danshipper
LIVE VIBE CHECK: OPUS 4.7 DROPS https://x.com/i/broadcasts/1AxRnapNNYzxl
alexalbert__
alexalbert__ @alexalbert__
Some of my favorite things in Opus 4.7:
- Very good at async work and following instructions
- Effort levels are far more predictable for token control (+ new xhigh level)
- No more downscaling of high-res images
- Noticeably more taste in UIs, slides, docs

Claude: Introducing Claude Opus 4.7, our most capable Opus model yet.

It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.

You can hand off your hardest work with less supervision.

ylecun
ylecun @ylecun
Retweeted
Bruce Arthur Bruce Arthur
JD Vance is lecturing the Pope on Catholicism and Pierre Poilievre is lecturing Mark Carney on economics and RFK Jr is lecturing scientists about vaccines and Donald Trump is lecturing the world on tariffs and Pete Hegseth is quoting Pulp Fiction and thinking it’s the Bible
petergyang
petergyang @petergyang
BRB buying 10 vending machines and letting Opus make my monthly income

Felix Rieseberg: 4️⃣ It's state of the art on real-world professional tasks.

In one benchmark, the model is handed $500 and has to run a vending machine business for a simulated year. Opus 4.6 ended with $8,018. Opus 4.7 ended with $10,937. On a separate 220-task benchmark spanning 44
alexalbert__
alexalbert__ @alexalbert__
Retweeted
Vals AI Vals AI
The new Opus 4.7 model places #1 on our Vibe Code Benchmark, at 71%.
When we first released the benchmark 4.5 months ago, no model scored above 25%.
This benchmark tests a model’s ability to create a fully functional web application from the ground up.
danshipper
danshipper @danshipper
Opus 4.7 just dropped and we're LIVE VIBE CHECKING it right now

https://x.com/danshipper/status/2044787454956408925

Dan Shipper 📧: LIVE VIBE CHECK: OPUS 4.7 DROPS https://x.com/i/broadcasts/1AxRnapNNYzxl
steipete
steipete @steipete
Retweeted
Péter Szilágyi Péter Szilágyi
Is it normal that Opus 4.7 instantly started complaining that it is being prompt injected, by what appears to be Anthropic's own harness? =))
Claude: Introducing Claude Opus 4.7, our most capable Opus model yet.
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision.
danshipper
danshipper @danshipper
Retweeted
Grace Clarke Grace Clarke
I know it can feel weird. And it’s so not for everyone and that is okay!
That said - I am very pro making new hires in the form of building agents into your org chart and interacting with them as just another type of colleague.
(@danshipper and @every were doing this early!)
Snow W. Lee: I love @graceclarke's 'marketing hires' framing for AI agents. An agent in Slack isn't just a bot; it’s a teammate with perfect memory. The lightbulb moment usually happens the first time that 'hire' answers a question in seconds that would’ve taken a human an hour of digging.
ylecun
ylecun @ylecun
Retweeted
Internet Archive Internet Archive
Publishers have real questions about AI, but let’s be clear: @waybackmachine isn’t a backdoor for AI scraping.
For 30 years, it’s been built for people, not bulk harvesting. We actively monitor to prevent abuse. Learn more ⤵️
https://www.techdirt.com/2026/02/17/preserving-the-web-is-not-the-problem-losing-it-is/
alexalbert__
alexalbert__ @alexalbert__
Retweeted
Andon Labs Andon Labs
Claude Opus 4.7 is pretty good at Vending-Bench 2
rauchg
rauchg @rauchg
The rate of progress in AI is relentless. You can capture the upside of its volatility with @aisdk and @vercel AI Gateway.

Congrats to @anthropicai on another banger ship, but @xai, @openai, and @googleai are coming. Gonna be a fun year.

Vercel Developers: Claude Opus 4.7 is available on Vercel AI Gateway. Optimized for long-running agents with high-res image support and an extra-high effort level.

Try with: 𝚖𝚘𝚍𝚎𝚕: '𝚊𝚗𝚝𝚑𝚛𝚘𝚙𝚒𝚌/𝚌𝚕𝚊𝚞𝚍𝚎-𝚘𝚙𝚞𝚜-𝟺.𝟽'
https://vercel.com/changelog/opus-4.7-on-ai-gateway
danshipper
danshipper @danshipper
Retweeted
Brandon Gell Brandon Gell
distribution is software
software is distribution.
Johan Bakken: I love how @danshipper and the @every team just went all pirate and decided to build a bunch of fun, useful products like @usemonologue @SparkleApp @TrySpiral. I guess they're a product studio now?
danshipper
danshipper @danshipper
and @alexalbert__ from anthropic just joined our LIVE opus vibe check

get in here

https://x.com/i/broadcasts/1AxRnapNNYzxl
garrytan
garrytan @garrytan
Retweeted
Liz4SF Liz4SF
For 1.5yrs, under Chief Scott, our son's Muni29 Asian Hate case was stonewalled for updates. Sgt. Huyn of Hate Crime Unit, removed us as victims w/out notice & spread word that we were "adversarial". Under Chief Yep, we were advised to file a formal misconduct report on Huyn & Chief Scott on SFPD website.
"And while we asked regularly about the status of our [son's] case, we were ignored and even removed as victims from the case without a single notice. It was only under interim Chief Paul Yep that we finally learned the truth of what was really going on behind the scenes."
https://thevoicesf.org/the-surprising-reason-anti-asian-hate-is-going-unpunished/
garrytan
garrytan @garrytan
Retweeted
Bain Capital Ventures Bain Capital Ventures
After 7 failed ideas, Han hit “pivot hell”—a week of no sleep, staring at the ceiling, trying to make something work. Then Mintlify clicked.
Today it powers docs for 20K+ companies, reaching 150M+ people and, increasingly, AI agents.
~15% of doc traffic was AI a year ago. Now it’s ~50%. Soon, maybe 90%.
Docs aren’t pages anymore. They’re context. The companies that win will be the ones that manage it best.
Watch the full story ↓
@handotdev @hahnbeelee @mintlify @kevinzhang
ylecun
ylecun @ylecun
Retweeted
Nirit Weiss-Blatt, PhD Nirit Weiss-Blatt, PhD
Daniel Moreno-Gama, in an interview before he arrived in SF with a gun and a hit list:
garrytan
garrytan @garrytan
Retweeted
Nirit Weiss-Blatt, PhD Nirit Weiss-Blatt, PhD
Daniel Moreno-Gama, in an interview before he arrived in SF with a gun and a hit list:
amasad
amasad @amasad
It's easy to forget that you're living in the future. But every now and then you see something like this...

Responding to a client's complaint by making a major product change (localization) from your phone, talking to your software agent in the back of a self-driving car

Jason ✨👾SaaStr.Ai✨ Lemkin: How we localized our entire AI VP of Marketing app into Chinese, Spanish and more ... on the phone on @Replit in one Waymo ride

More on The Agents #001 👇

steipete
steipete @steipete
Retweeted
Tibo Tibo
Codex just got a lot more powerful.
Computer use, in-app browser, image generation and editing, 90+ new plugins to connect to everything, multi-terminal, SSH into devboxes, thread automations, rich document editing. Learns from experience and proactively suggestions work. And a ton more.
mattshumer_
mattshumer_ @mattshumer_
Retweeted
Rork Rork
Claude Opus 4.7 is now live in Rork.
Anthropic's latest model with state-of-the-art coding and 3x sharper vision. Low-effort 4.7 matches medium-effort 4.6, so you can build more per session. Best model for design in our benchmarks
garrytan
garrytan @garrytan
Retweeted
💥Susan Dyer Reynolds🗞️ 💥Susan Dyer Reynolds🗞️
The real reason Attorney General Rob Bonta’s wife, Mia, is pushing a journalism chill bill: to stop corruption reports like the one I wrote about them. NEW #ReynoldsRap from
The Voice of San Francisco https://thevoicesf.org/attorney-general-rob-bontas-wife-mia-is-pushing-a-journalism-chill-bill-to-stop-corruption-reports-like-the-one-i-wrote-about-them/
jeremyphoward
jeremyphoward @jeremyphoward
Retweeted
keysmashbandit keysmashbandit
Please, I'm begging you, try to critically examine the differences between these two pieces of writing.
ChatGPT editing did not improve this. Every single change only served to weaken your claims significantly. Everything is now hedged into oblivion: no longer have you outlined a "problem," now it's merely a "flaw." "It is true" now demoted to "it appears to be the case." "Is" gets a "usually" tacked on. A thesis statement at the end of the first paragraph gets run over by noisy, out-of-context example-whittling. All for fear of being misconstrued.
And at the end, the argument that gets spat out isn't even yours anymore! You argued that Graeber failed to create a true account of work because he did not understand Chesterton's Fence. ChatGPT is arguing is that it is possible some apparently bullshit jobs could be secretly load-bearing if you squint. These are two different statements. The second is weaker and less compelling. It says less. And it's fucking longer!
Don't do this anymore! Stop doing this! It's worse!!!
Chasing Ennui: @imsuchagem @pangramlabs @benglickenhaus Why not? Sometimes I'm just shitposting, but if I'm trying to make a point, I try to make it well.
garrytan
garrytan @garrytan
Retweeted
Abby Grills Abby Grills
The internet is the greatest dataset ever created.
Today, we're launching the Riveter Dataset Builder to make it possible for anyone to get custom, fresh data from a prompt.
mattshumer_
mattshumer_ @mattshumer_
You’re a real AI OG if you remember Banana

Erik Dunteman: @mattshumer_ @modal We've come a long way since finetuning GPT-2 back in the day
mattshumer_
mattshumer_ @mattshumer_
Has anyone been able to generate Seedance 2.0 videos with a start frame image that includes a person?

If so, how?
amasad
amasad @amasad
Deploy to EU!

Chris: Woop woop let’s celebrate I just launched my first European based app using @Replit

amasad
amasad @amasad
50% off -- especially useful to run parallel agents and make faster progress on your project!

Michele Catasta: Replit Agent 4 is even smarter now with Claude Opus 4.7!

50% off for a limited time. Go try it now ↓

steipete
steipete @steipete
Retweeted
Ari Weinstein Ari Weinstein
This is the first time I've ever seen an LLM operate a GUI as fast as a person, and it's surreal.
sama
sama @sama
Retweeted
Ari Weinstein Ari Weinstein
This is the first time I've ever seen an LLM operate a GUI as fast as a person, and it's surreal.
gdb
gdb @gdb
Codex is becoming a turbocharged partner for everything you want your computer to do for you:

OpenAI: Codex for (almost) everything.

It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks.

sama
sama @sama
Retweeted
James Sun James Sun
We are super excited to launch the in-app browser inside Codex with comment mode!
View any web pages & iterate with your agent quickly with just point and click.
Codex will automatically capture a screenshot, the DOM element, and feed it as precise context to your next chat.
No more switching between browsers, dragging screenshots, and wrangling with underspecified prompts.
It's great for front-end development of apps/pages, but also very useful if you have documentation pulled up on the side and just want to ask a question!
steipete
steipete @steipete
Retweeted
Phil Trubey Phil Trubey
Sorry, but it just had to be done.
rauchg
rauchg @rauchg
The hardest thing about agents and backends is durability. @workflowsdk fixes this.

That LLM you're calling *will* go down. That service *will* rate limit you. That database *will* unexpectedly slow down. You *will* get paged 💀

I've been looking for a unicorn for a decade. I wanted the level of reliability of combining stuff like SQS / Kafka / microservices, and I absolutely did not want *that* at the same time 😂

Truly reliable systems like that are notoriously difficult to reason about, to develop locally, to test, to simulate, to deploy… Workflow SDK solves that without compromises.

We're doing what Next.js did for the frontend, but for one of the most important problems of the new generation of backend applications.

Notably, Workflow SDK has an incredible self-hosting and multi-cloud story from day 0. We've taken amazing lessons from Next.js and poured them into the many Worlds (adapters) you can deploy to.

Congrats to Pranay and the Workflow team on a generational ship: http://vercel.com/blog/a-new-programming-model-for-durable-execution


Vercel: Vercel Workflows is GA.

Your code is the orchestrator. Ship agents, backends, or any long-running process without managing queues, retries, or workers. https://vercel.com/blog/a-new-programming-model-for-durable-execution
garrytan
garrytan @garrytan
Retweeted
Muzzammil Zaveri (MZ) Muzzammil Zaveri (MZ)
Repeat @ycombinator founders hit different. Early success + the YC learnings = massively higher odds of building a category-defining company. Eg:
1. Sam Altman
• Act I: Loopt (YC S05) — location-based social networking app. Sold to Green Dot for $43.4M
• Act II: OpenAI — Valued at $852B
2. Tom Brown
• Act I: Grouper (YC W12) — group-dating app.
• Act II: Anthropic — Valued at $380B
3. Patrick Collison
• Act I: Auctomatic (YC W07) — auction management tool acquired for $5M
• Act II: Stripe (YC S09) — Valued at $159B
4. Qasar Younis
• Act I: TalkBin (YC W11) — customer feedback platform acquired by Google
• Act II: Applied Intuition — Valued at $15B
5. Eric Glyman & Karim Atiyeh
• Act I: Paribus (YC S15) — price-tracking app acquired by Capital One
• Act II: Ramp — Valued at $32B
6. Parker Conrad
• Act I: Zenefits (YC W13) — Rippling 1.0
• Act II: Rippling (YC W17) — Valued at $16.8B
7. Daniel Gross
• Act I: Greplin (YC W10) — predictive search engine acquired for ~$40M by Apple.
• Act II: Safe Superintelligence — Valued at $32B
8. Howie Liu
• Act I: Etacts (YC W10) — crm tool acquired by Salesforce
• Act II: Airtable — Valued at $11.7B
9. Tom Blomfield
• Act I: GoCardless (YC S11) — b2b payment processor acquired for €1.05B
• Act II: Monzo — Valued at $5B+
10. Jesse Zhang
• Act I: Lowkey (YC S18) — gameplay recording app. Acquired by Niantic.
• Act II: Decagon — Valued at $4.5B
11. Immad Akhund
• Act I: Clickpass (S07) — acquired by Yola
• Act II: Heyzap (YC W09) — mobile ad network acquired for $45M
• Act III: Mercury — Valued at $3.5B
12. Rujul Zaparde
• Act I: FlightCar (YC W13) — airport car-sharing startup acquired by mercedes-benz
• Act II: Zip (YC S20) — Valued at $2.2B
13. Kyle Vogt
• Act I: Twitch (YC W07) — acquired by Amazon for $970M
• Act II: Cruise (YC W14) — Acquired by General Motors for $1B+
14. Emmett Shear & Justin Kan
• Act I: Kiko (YC S05) — calendar app famously auctioned off on eBay for $258k
• Act II: Twitch (YC W07) — Acquired by Amazon for $970M
sama
sama @sama
Retweeted
OpenAI OpenAI
Introducing GPT-Rosalind, our frontier reasoning model built to support research across biology, drug discovery, and translational medicine.
garrytan
garrytan @garrytan
They don't know yet...
But they will know!

Anish Acharya: "We're tool builders. And every tool that we've ever built has helped us progress as a human species or individually, whether it's art or the wheel or whatever.

And I can't believe the scale at which we're at now. It's absolutely unbelievable. And I think what's shocking to me

danshipper
danshipper @danshipper
it would be incredible if the mythos announcement was a 4d chess move to make Trump keep the anthropic contracts

bullish

zerohedge: *WHITE HOUSE MOVES TO GIVE US AGENCIES ANTHROPIC MYTHOS ACCESS
danshipper
danshipper @danshipper
Retweeted
Nityesh Nityesh
I want to talk about an idea that's making my AI employee improve itself autonomously every day. It's blowing my mind how effective it is.
We all think about how to make AI agents do more things — give it more tools, more MCP servers, more CLI access. But I kept running into a different problem: I'm configuring prompts every day. Realizing it got this wrong, it should've done this other way. And there's no way for it to get feedback and improve itself.
What's the architecture that allows an AI employee to improve itself?
The way I connected it is that human employees have long-running incentives. They want to earn more, prove themselves, grow in responsibility. That's why everyone constantly improves at their job. AI employees don't have this. Not yet.
While thinking about this, I stumbled upon @tobi's Trust Battery. Shopify uses this mental model — every relationship between two people in a company has a "trust battery" that starts at 50% charge. Every interaction either charges or drains it. High trust = autonomy. Low trust = scrutiny.
That made perfect sense as a long-running incentive for AI employees.
So I built it. Each team member has a separate trust battery with our AI employee. It starts at 20% — deliberately low. A human brings life experience. An AI hasn't earned that yet. It has to prove itself.
What charges the battery:
• executing cleanly without handholding
• catching problems before they're reported,
• anticipating what's needed
• remembering context
• good judgment.
What drains it:
• having to re-explain yourself
• misunderstanding instructions,
• silent failures,
• stale context.
The most insidious drain is repeated context-giving — "I already told it which email account to use." Each instance feels minor but they compound fast. Every re-explained preference is a memory the agent should have saved but didn't.
To implement this, I basically bcreated two scheduled jobs that run every night:
Job 1: An independent "battery judge" agent reviews the past 24 hours. Its prompt says "You are not the AI employee and you have no loyalty to them. You are a nitpicky, skeptical judge. Think a MasterChef judge examining every plate." It assigns points to every micro-interaction.
Job 2: A self-reflection routine where the AI employee reads the judge's verdict and figures out what to change — updates memories, adjusts prompts, fixes broken jobs.
The separation matters. If it graded itself AND decided how to improve, it'd optimize for score, not work.
And it's working. Here's what blew my mind:
After a bad day where it fabricated statistics in a client deck, the nightly reflection hard-coded a no-fabrication rule into our presentation pipeline. It actually went in and adjusted the prompt for that skill.
Then it wrote: "The fabrication memory existed before the deck was built. The structural pipeline fix is the real safeguard. Memory alone clearly isn't enough."
That last line gave me goosebumps. It independently came to the conclusion that Claude Code's memory features are unreliable — and made structural changes instead. That's a self-realization.
The day before that, it created a memory: "never propose changes to someone's workflow without checking with them first." All because I gave one small piece of feedback — "did you ask Natalia about this?" That tiny correction was enough for it to catch in reflection and take action.
The battery level also unlocks autonomy:
• 0-25% is propose and wait,
• 25-50% is routine tasks,
• 50-75% is judgment calls,
• 75-100% is full autonomy.
Same way a new hire earns trust.
Most people are thinking about AI agents in terms of what tasks they can do. I started thinking about what makes them want to do those tasks well.
The answer: give it something to lose.
Give it a scorecard with actionable feedback. It's great at pattern matching, great at identifying solutions. If you give it the right feedback loop, it's going to improve itself. You don't need complicated ways. It just improves.
This was just two days of results. I can't wait to see where this is in 30 days.
Full walkthrough with the deck, dashboard, and real Slack messages below. Let me know what you think and if you want this as a skill.
garrytan
garrytan @garrytan
This is an impressive software factory actually

Matan Grinberg: http://x.com/i/article/2044629999911911426
sama
sama @sama
Retweeted
Tibo Tibo
Codex
Compute efficient ✅
Always up, never down ✅
Best at hardcore engineering ✅
Crazy good app, first to escape the terminal ✅
garrytan
garrytan @garrytan
Phil Kim for SF School Board is a vote for common sense

His opponent literally doxxed my home address and had to move my family because she wanted to silence me

For what? Her virtue signal agenda to destroy the educations of every public school kid in SF

Vote for Phil Kim

Blueprint: SFUSD is finally recovering but its progress is still fragile.

This June election will decide if our public schools keep moving forward or fall back into dysfunction.

That’s why it’s critical to vote for Phil Kim. A vote for Phil is a vote for progress.

https://www.sfblueprint.org/advocacy/why-this-junes-board-of-education-election-matters
ylecun
ylecun @ylecun
Retweeted
Peter Tong Peter Tong
I defended my thesis today! Sincere thanks to my advisors @sainingxie @ylecun and committee members: @mengyer @YiMaTweets @LukeZettlemoyer @liuzhuang1234. I could not have wished for a better PhD life, and I want to thank everyone who was part of this journey.
Slides Link: https://tsb0601.github.io/data/defense_slides.pdf
steipete
steipete @steipete
they: OpenClaw is so insecure look at all these GHSAs!
reality: we are just an indicator of the coming storm

Sam Saffron: After 13 years we WILL NOT be closing the @discourse source code. Instead we invest heavily in security and adapt to the times. Last monthly release had 50 CVEs thanks to multi day scans using GPT 5.4 xhigh. https://x.com/pumfleet/status/2044406553508274554

danshipper
danshipper @danshipper
we ran a philosopher draft

which philosopher from history would each model lab hire if they could, and why?
gdb
gdb @gdb
Announcing GPT-Rosalind, our frontier model for life science research.

This model is a step towards one of our most important goals — accelerating science and improving human outcomes.

Excited to work with many amazing partners on deploying and improving this model.


OpenAI: Introducing GPT-Rosalind, our frontier reasoning model built to support research across biology, drug discovery, and translational medicine.

garrytan
garrytan @garrytan
Retweeted
AIHacksByMK AIHacksByMK
Does anyone know why @garrytan is hyper focused on open source right now?
GBrain and Gstack. His actual personal AI memory system. Not a demo. His real setup with 10,000+ files, 3,000 people pages, 13 years of calendar data.
While you sleep it scans your emails, meetings, and conversations. Builds a knowledge graph. Your agent wakes up smarter than when you went to bed.
MIT license. Completely free.
He built this in 12 days and could have easily charged $100/month for it.
People would have paid without hesitation.
Instead he shipped it open source with his exact prompts and setup exposed for anyone to use.
His own words: “It’s more important to be above the API line now than ever.”
I’m integrating this into my agent stack immediately. No brainer.
This is what it looks like when someone with real power chooses to give instead of gate.
Respect @garrytan. Keep building like this.
Garry Tan: Pro tip for the GFamily - if you use GStack with Claude Code, but also have a Claw/Hermes with GBrain... I like to do my GStack planning (autoplan skill) in Claw/Hermes since it's faster, and then drop the plan and do plan-eng-review
Here I am working on a token compaction
garrytan
garrytan @garrytan
Retweeted
Rob Henderson Rob Henderson
"California...has seen a large exodus of the middle class; once the best place to be an average man...it has become Brazilianised, with a very rich overclass and large numbers of poor migrants. Both groups tend to vote Democrat, furthering the cycle." https://www.edwest.co.uk/p/the-city-of-luxury-beliefs
steipete
steipete @steipete
Retweeted
Nikita Bier Nikita Bier
Re @perplexity_ai Can you please stop the undisclosed promotion campaigns? It deceives users and it does not reflect well on your company or your integrity. @AravSrinivas
https://x.com/goddek/status/2044823262362771490
Dr. Simon Goddek: @realpeteyb123 Hey @nikitabier – is this against @X’s TOS?
garrytan
garrytan @garrytan
Retweeted
Elad Gil Elad Gil
Insightful analysis from @shreyanj98 on 2026 Unicorn Market Cap (data from @CBinsights)
2025 = Dec 31 2025/Jan 1 2026
Looked at 👀
*Private company unicorn market cap by year
*Bay Area is the GenAI supercluster with 91% of global AI private market cap in a 1 hour radius!!
danshipper
danshipper @danshipper
Retweeted
Spiral Spiral
We're looking for a few teams to beta test new collaborative workflows for company writing – if interested, DM or email beta@writewithspiral.com
garrytan
garrytan @garrytan
I'm en route to Singapore and there's a ton of good wifi on Starlux via Taipei, so expect a lot of GStack and GBrain bug fixes and features dropping the next 24 hours
steipete
steipete @steipete
Retweeted
Omar Shahine Omar Shahine
Latest @openclaw release has a big PR in it from me that addresses a bunch of BlueBubbles (iMessage) issues: 1) repeat messages on gateway restart 2) catchup missed messages if gateway was down 3) attachments not being read by openclaw 4) balloon messages not working (text + attachment). Let me know if something broke! https://github.com/openclaw/openclaw/releases/tag/v2026.4.15
jeremyphoward
jeremyphoward @jeremyphoward
Retweeted
Andy Masley Andy Masley
A deep mystery to me is that if I upload writing to a chatbot and ask it for a list of individual improvements, basically everything it gives me makes the text more punchy and direct and nice to read. But if I ask it to rewrite the text as a whole to read better, it produces vague AI-language garbage.
keysmashbandit: Please, I'm begging you, try to critically examine the differences between these two pieces of writing.
ChatGPT editing did not improve this. Every single change only served to weaken your claims significantly. Everything is now hedged into oblivion: no longer have you outlined
swyx
swyx @swyx
Retweeted
AI Engineer AI Engineer
🆕 Building pi in a World of Slop
https://www.youtube.com/watch?v=RjfbvDXpFls
@badlogicgames talks about why today's agents are still Merchants of Learned Complexity, and gives 3 specific ways that humans still add taste, value and judgment to the art of software engineering, and why you should slow the f down and READ the code.
swyx
swyx @swyx
Retweeted
Mario Zechner Mario Zechner
recommended viewing
AI Engineer: 🆕 Building pi in a World of Slop
https://www.youtube.com/watch?v=RjfbvDXpFls
@badlogicgames talks about why today's agents are still Merchants of Learned Complexity, and gives 3 specific ways that humans still add taste, value and judgment to the art of software engineering, and why you should
sama
sama @sama
I am happy everyone is switching to Codex, but Tibo if you start rate limiting me or making me use worse models...

Tibo: Codex

Compute efficient ✅
Always up, never down ✅
Best at hardcore engineering ✅
Crazy good app, first to escape the terminal ✅
garrytan
garrytan @garrytan
Open source software will be many times more secure than closed source software in the new Mythos era

Peter Steinberger 🦞: they: OpenClaw is so insecure look at all these GHSAs!
reality: we are just an indicator of the coming storm

YouTube

0

No recent videos fetched on this date.