← 2026-04-22

Daily Edition

2026-04-23

2026-04-24 →

AI Builders 日报 — 4月23日

追踪 AI 领域真正在做事的人,而不是空谈者。

今日思考

GPT-5.5 发布当天,最大的信号不是模型本身有多强,而是 Sam Altman 明确说了"我们必须变成一家 AI 推理公司"。这意味着 OpenAI 的重心正在从"训练最强模型"转向"把最强模型高效地serve出去"——推理战争正在变成基础设施战争。Codex 已经能够在真实工作流中独立完成完整任务,而不仅仅是辅助编程;Noam Brown 这位 OpenAI 自己的 manager 用 GPT-5.5 写 CUDA kernels 就像专业工程师一样熟练,这才是真正让人倒吸一口冷气的地方。

与此同时,Garry Tan 的"thin harness, fat skills"哲学正在被真实用户验证:context 不爆炸的秘诀不是更大的上下文窗口,而是让 subagent 自己决定加载哪个技能——这个认知正在重塑 how builders think about AI tooling。


产品与发布

GPT-5.5 正式发布

OpenAI 发布 GPT-5.5,定位为"真正能完成复杂任务、驱动 AI 代理的新一代智能"。核心 benchmark 数据:Terminal-Bench 2.0 82.7%、SWE-Bench Pro 58.6%、Expert-SWE 73.1%、GDPval 84.9%(知识工作接近解决)。定价:输入 $5/M tokens,输出 $30/M tokens。上下文窗口:Codex 400K,API 1M。faviconx.com

Codex 企业版来了

Sam Altman 和 Greg Brockman 宣布与 NVIDIA 合作,在整个公司范围内推广 Codex。Greg 透露正在向企业级客户 rollout,联系 faviconfaviconx.com

ChatGPT for Clinicians

OpenAI 推出免费版 ChatGPT,专为临床工作场景设计,同时发布 HealthBench Professional 基准测试,用于评估真实临床对话任务。faviconx.com

GPT-Image-2-Thinking

Sam Altman 表示 Images 2.0 "跨越了我之前不知道存在的某个重要质量门槛"。swyx 给出最佳框架解释:Image-2-Thinking 本质是一个 Image AGENT,内置 search 和 photoshop 作为工具,能在 agent loop 里搜索、合成、review 自己的作品——就像 Gemini Flash Vision 用 agent 循环改变了 image-to-text,现在 Image-2-Thinking 在 text-to-image 做同样的事。生成耗时数十分钟,但可以一次生成 QR 码、图表、logo、食品、人脸。faviconx.com

Replit Auto-Protect

Replit 发布 24x7 安全监控服务,自动监视威胁、提前准备修复方案、在用户离线时通知应用修复——从"构建 AI"转向"AI 在你睡觉时维护生产环境"。faviconx.com

Replit 移动端测试上线

Replit 推出浏览器内一键测试 iOS 和 Android 真实设备的功能,解决了 Windows/Android 用户无法测试 iOS 应用的痛点。faviconx.com

Replit Agent 进入 Google Cloud

Replit Agent 现已整合进 Google Cloud Gemini Enterprise,首批 90+ 合作伙伴 Agent 之一。faviconx.com

GStack 技能调用可视化

GStack 的 /codex skill calls 现在提供完整解释,让 ADHD CEO 类型也能理解 Codex 的思考过程。同时发布官方介绍视频,展示 /office-hours 和 /design-shotgun 两个技能的用法。faviconx.com

World Labs World Jam 上线

World Labs 推出为期两周的 World Jam,配合 Marble 1.1 和 Spark LoD 工具,构建交互式 3D 世界的最新工具。faviconx.com


观点与判断

Amjad Masad (Replit CEO)

  • Replit 离开旧金山两年:10x 估值,200x ARR,搬进 Foster City 旧 IBM 总部 Replit 从 SF 迁出两年,10x 估值、200x ARR,现已接管 Foster City 旧 IBM 园区大部分区域。Amjad 称"IBM 帮助创建了这个行业,我们正在帮助重新定义人们如何创造软件",并表示 SF"美丽但不是正确的地方"。faviconx.com

  • Replit 白皮书:当前 LLMs 结合静态分析工具可达 90%+ 安全效果 Replit 发布白皮书,证明当前一代 LLMs 结合静态分析工具可以显著提升代码安全效果(部分场景超 90%)。Amjad 点评:"你还没用上 Mythos,不代表只能干等。"faviconx.com

  • Replit Security Agent 获好评 用户 Matt Beebe 测试后称"令人印象深刻的好东西",Amjad 转推并表示"Replit Security Agent 正在让互联网变得更好,一次一个 app review"。faviconx.com

Sam Altman (OpenAI CEO)

  • OpenAI 三信条:迭代部署、民主化、让用户赢 1)相信迭代部署是安全策略,世界需要准备好应对 AI 韧性的团队运动;2)相信民主化,希望人们用上大量 AI,目标是最 efficient 的模型和最高效的推理栈;3)爱用户,希望成为每个公司、科学家、创业者的平台。faviconx.com

  • GPT-5.5 的下一步改进会非常快 引用Jakub 的预测:"短期内会有显著进步,中期内会有极其显著的进步",并称"过去几年其实出乎意料地慢"。faviconx.com

  • OpenAI 必须成为 AI 推理公司 "我们正在把 Codex 推广到整个企业"并强调推理团队的工作是这次发布的真正亮点。faviconx.com

Garry Tan (Y Combinator CEO)

  • GStack 的核心洞察:subagent 选技能才是大突破 用户 ByteCrafter 用 GStack 技能和 subagents 三周,发现真正突破不是单个技能,而是 subagent 自己决定加载哪个技能来避免 context 在 400 行任务上爆炸——单次 prompt 完成了过去需要 4-5 轮的工作。Garry 表示"Resolvers in GStack for the win!"。faviconx.com

  • OpenClaw + AI 定制软件才是未来 Garry 自述让 AI 用两分钟完成了一个工具的初始版本,然后用 15-20 分钟迭代调整到完美。他感慨:"有些东西作为即时软件(just-in-time software)来做要好得多。"faviconx.com

  • 加州公立学校学生数量将以近 3 倍全国平均的速度流失 加州到 2031 年将流失 15.7% 的公立学校学生 Idaho 和 Florida 却在增长。"这不是必然——这是住房成本过高和教育失效的后果。"faviconx.com

  • Sam Altman 是YC创始人的代表了 Sam Altman 在 YC 演讲后,Kulveer 发帖感慨两人当年都是 founder 时 Sam 就与众不同,二十年后两人回到 YC 帮助下一代。Garry 转推表示认同。faviconx.com

swyx (swyx)

  • AI 研究人员的工作方式正在转变 研究者只需给 AI 一个模糊的直觉、计算预算和工具访问权限,AI 就能返回实验结果、失败运行、图表、反例、修订后的假设——进步的单位变成了实验吞吐量。faviconx.com

Amanda Askell (Anthropic)

  • 正在亲历人类历史上最关键的时期之一 "很奇怪,正在经历一个感觉像是人类历史上最关键的时期之一,却从内部感受到所有这些重量。"faviconx.com

技术动态

swyx (swyx)

  • GPT-5.5 的新 Pareto 前沿:性能与成本同时优化 新 context 窗口(Codex 400K,API 1M)、API 定价($5/$30 per M tokens)、GB200/GB300 NVL72 联合设计、Codex 自己把自己的推理速度提升了 20%。Benchmark:Terminal-Bench 2.0 82.7%、Expert-SWE 73.1%、SWE-Bench Pro 58.6%、GDPval 84.9%(知识工作接近解决)、Tau2-bench Telecom 98.0%(支持场景基本解决)、BixBench 80.5%(生物信息学与数据分析)。faviconx.com

  • GPT-Image-2-Thinking 架构解析:Image AGENT 的新范式 核心洞察:Image-2 是新图像模型,Image-2-Thinking 是新的 Image AGENT,把 search 和 photoshop 作为工具整合进 agent loop,可以搜索、合成、review 自己的作品——和 Gemini Flash Vision 在 image-to-text 引入 agent 循环的做法如出一辙。faviconx.com

Replit

  • Replit 白皮书:静态分析工具大幅提升 LLM 代码安全 Replit 发布白皮书,研究表明当前一代 LLMs 结合静态分析工具可在大部分场景达到 90%+ 安全效果,为 AI 生成代码的安全保障提供了新范式。faviconsecuring-ai-generated-code.replit.app

ICLR 2026

  • ICLR 2016 最佳论文奖授予无 PhD 团队 AlecRad、Luke_Metz、soumithchintala 三人(均无 PhD)在 20 多岁时发表的工作获得 ICLR 2026 Test of Time Award。chipro 祝贺并表示"希望看到这三位再次合作"。faviconx.com

Yann LeCun (Meta)

  • ICLR 2026 论文:LLMs 过度压缩导致"语义锋利但细节空洞" 一篇 ICLR 2026 论文(From Tokens to Thoughts)发现:LLMs 在信息论层面过度压缩,丢弃了人类保留的"低效"细微差别;encoder 模型在很多情况下比大得多的 decoder 更与人类对齐;训练过程中语义处理从深层迁移到中层网络。faviconx.com

X / Twitter

93
swyx
swyx @swyx
Retweeted
Rebecca Bakels Rebecca Bakels
Swyx hot take: “you should all move to Miami”👏👏👏
garrytan
garrytan @garrytan
Retweeted
Sheel Mohnot Sheel Mohnot
Wealth tax math: one-time ∼$50B windfall for California
∼$4B lost every year from current leavers
And that doesn't include the worst part: future founders who never move here in the first place!
CalTax: The wealth tax initiative proposed in CA would reduce ongoing tax revenue $3.53 billion to $4.49 billion per year, California Tax Foundation research finds. The negative effects on California’s tax base "could be substantial and persistent." https://caltax.org/foundation/reports/Revenue-Implications-of-Billionaire-Tax-Act.pdf
petergyang
petergyang @petergyang
Craft > slop

I love using AI to generate things too but craft is in that last 10% where you manually apply your taste to make something you can be proud of.

Many people never bother.
amasad
amasad @amasad
Where can I wire the seed check?

Robleh: so i blocked youtube on my daughters ipad last week. today i learned she went to Replit and built a tool to watch youtube videos just by pasting the link.

and i'm not even mad.
garrytan
garrytan @garrytan
Retweeted
Matthew Yglesias Matthew Yglesias
Re You shouldn’t steal. You shouldn’t murder health insurance executives. You shouldn’t reflexively side with every country that’s hostile to the United States. This is all incredibly stupid.
swyx
swyx @swyx
GPT 5.5 tomorrow would be the best damn birthday gift I could ever ask for

Sam Altman: @Suolar_ @scaling01 🫡
swyx
swyx @swyx
btw in talking to friends the best framing for how to discuss GPT-Image-2-Thinking taking multiple tens of mins for generation and being able to oneshot QR codes and diagrams and logos and foods and faces..

...is that Image-2 is a new Image model, Image-2-Thinking is a new Image AGENT that basically has search and photoshop as a tool to use in an agent loop that can search and composite and review its own work.

the same way Gemini Flash Vision destroyed benchmarks by introducing an agentic loop for image-to-text, now Image-2-Thinking is doing it for text-to-image.





Hewar: Holy shit, I just switched to the thinking model

garrytan
garrytan @garrytan
Retweeted
Rob Henderson Rob Henderson
New luxury belief just dropped
gdb
gdb @gdb
Introducing ChatGPT for Clinicians:


Karan Singhal: Today we’re introducing two big steps for health at OpenAI:

- ChatGPT for Clinicians, a free version of ChatGPT designed for clinical work
- HealthBench Professional, a new benchmark to evaluate real clinician chat tasks

We’re excited about what this can unlock for care. ❤️

amasad
amasad @amasad
A Replit user just launched “Vibe Genomics: Sequencing your Whole Genome at Home”

https://vibe-genomics.replit.app

Useful if you’re inspired by Patrick’s AI genomics experiment but don’t want to share your DNA with a company.


Patrick Collison: I'm lucky enough to have a great doctor and access to excellent Bay Area medical care. I've taken lots of standard screening tests over the years and have tried lots of "health tech" devices and tools.

With all this said, by far the most useful preventative medical advice that
garrytan
garrytan @garrytan
Man between this and CrabTrap from @pedroh96 I am going to have a busy week

Tony Dang: http://x.com/i/article/2046844065690607616
amasad
amasad @amasad
You can now call Replit Agent from Gemini Enterprise.

Google Cloud: Today at #GoogleCloudNext, we're bringing 90+ partner-built agents to you in Gemini Enterprise!

From partners like Adobe, Atlassian, Box, Deloitte, Lovable, Oracle, Palo Alto Networks, Replit, S&P Global, Salesforce, ServiceNow, Workday, and more→ https://goo.gle/4cKU4p5

rauchg
rauchg @rauchg
I want to keep everyone updated on the details of the security investigation.

The team performed an in-depth analysis to search for root causes and to better understand the behavior of the threat actor.

We cast a very wide net, pulling and processing nearly a petabyte of logs of the entire Vercel Network and API, extending well beyond the initial Context[.]ai compromise.

We now understand that the threat actor has been active beyond that startup's compromise. Threat intel points to the distribution of malware to computers in search of valuable tokens like keys to Vercel accounts and other providers.

Once the attacker gets ahold of those keys, our logs show a repeated pattern: rapid and comprehensive API usage, with a focus on enumeration of non-sensitive environment variables.

As a result:
◾We've deepened and widened our collaboration with partners across the industry, like Microsoft, AWS and Wiz, to further protect the broader internet.
◾ We've notified other suspected victims of this threat actor, independent of this event, encouraging them to rotate credentials and adopt best practices.

We've also shipped a bunch more product enhancements. I'm extremely thankful to our team and industry partners for working around the clock. For more details on the ongoing investigation, refer to our security bulletin:
https://vercel.com/kb/bulletin/vercel-april-2026-security-incident
amasad
amasad @amasad
Replit Security Agent making the internet a better place one app review at a time.

Matt Beebe: Ok. Took a few minutes to run a few of my projects through the new @Replit security agent. I may write up a deeper dive later, but some quick thoughts: 🧵

tl;dr: Wow. This is good stuff. Very good stuff. 1/

amasad
amasad @amasad
You don’t have access to Mythos 🫵🤭

Doesn’t mean you can just sit around and wait.

Replit published a whitepaper showing you can get significantly better performance from current gen LLMs (90%+ in some cases) by combining with static analysis tools.

https://securing-ai-generated-code.replit.app
ylecun
ylecun @ylecun
Retweeted
Sci-Fi Archives Sci-Fi Archives
The first simulated image of a black hole, calculated with an IBM 7040 computer using 1960 punch cards and hand-plotted by French astrophysicist Jean-Pierre Luminet in 1978.
amasad
amasad @amasad
Retweeted
prince prince
Replit shipping Auto-Protect is the real signal here: the stack is moving from "build with AI" to "have AI maintain prod while you sleep."
If this category works, app security starts looking a lot more like autopilot than dashboards.
https://x.com/Replit/status/2047020966598078863
Replit ⠕: Keeping your apps secure has always required constant oversight from you.
Replit Auto-Protect now keeps watch over your apps 24x7.
We'll monitor threats, proactively prepare fixes and notify you to apply those fixes, even when you are away.
garrytan
garrytan @garrytan
Retweeted
Nan Wang Nan Wang
This is how you build reliable agents.
- The 10-point checklist covers the stack end-to-end — goal fulfillment, trigger logic, overlap detection.
- The deterministic/latent split is the insight most teams learn the hard way. Use code as guardrails. Let the model handle the emergent stuff.
Garry Tan: http://x.com/i/article/2046866228703363072
garrytan
garrytan @garrytan
Retweeted
Harj Taggar Harj Taggar
Sandboxing my openclaw made it more secure but managing API keys and env variables has become a real headache. This seems like the right approach.
Tony Dang: http://x.com/i/article/2046844065690607616
garrytan
garrytan @garrytan
Retweeted
hushhh hushhh
was reading Garry Tan's article on “thin harness, fat skills” and this idea stuck with me more than expected
the bottleneck isn’t the model being dumb, it’s how we’re using it
most people (me included tbh) think better models = better results
but the 10x vs 100x difference is actually in the setup
fat harness = too many tools, bloated context, slow + noisy systems
thin harness = just enough to run the loop, fetch data, manage context
the real leverage is in “skills”
reusable processes written in plain language that the model can apply again and again
kinda like functions, but for reasoning
add things like resolvers (load the right info at the right time), keep deterministic stuff separate, and it starts feeling less like prompting and more like building an actual system
idk it lowkey changed how i think about “using ai” vs actually designing with it
garrytan
garrytan @garrytan
Retweeted
Guri Singh Guri Singh
Video editors are going to panic.
The team behind browser-use just open sourced an editor that runs inside Claude Code. You drop raw footage in a folder, type "edit these into a launch video," and it ships final.mp4.
It's called video-use.
→ Cuts "umm," "uh," and dead space at word-level timestamps
→ Auto color grades every segment with custom ffmpeg chains
→ Burns 2-word UPPERCASE subtitles and spawns Manim animations in parallel sub-agents
2.2K stars. Self-evals every cut before you see it.
100% open source.
Link in comments.
ylecun
ylecun @ylecun
Retweeted
Daniel Jeffries Daniel Jeffries
More people will die from suppressing AI than from the imaginary AI apocalypse.
They'll die from restricting safe self-driving cars that are 90% better drivers than people who kill 1.5 million people and injure 50 million more every year.
They'll die from the vaccines and cures that never get created.
They'll die from all the myriad of helpful inventions that never get created by geniuses in a datacenter.
They'll die from preventable diseases that they could have asked their chat bots about so they were better informed when they went to see their doctors but who couldn't ask because short-sighted legislators made it so the chat bots had to refuse to answer.
They'll die from the slower economy that stifles robot driven factories over wildly overblown jobs apocalypse fears which will mean we never get a vast array of new and more affordable goods.
They'll die from the cheaper solar panels and batteries that would get made by those automated factories which would slow climate damage and provide cheap energy to undeserved areas.
They'll die from the super smart tele-AI doctors that never get deployed to remote areas.
And they'll die as fanatics from the stop AI movement radicalize their followers to shoot people or throw firebombs.
Max Tegmark: Senator @BernieSanders has invited me and three other AI researchers to a public panel on AI existential risk & international cooperation at the U.S. Capitol 7pm Wednesday April 29th. RSVP here to join us for this important conversation: https://forms.office.com/Pages/ResponsePage.aspx?id=yQ08CVqFVEaBVDu8JMTbft48FJJZ68hMkT0BJFfVBUJUQ0VPS1BJQjFUWlBWNjNHQTI3TThWOE9OMi4u
ylecun
ylecun @ylecun
Retweeted
Daniel Jeffries Daniel Jeffries
"The pessimistic professor...can predict anything he wants, for if his prophecies don't come true now, just wait: failure could be just around the corner, or else his voice of reason has prevented the worst. The prophets of doom sound oh so profound, whatever they spout."
- Human Kind, Rutger Bregman
ylecun
ylecun @ylecun
Retweeted
Yann LeCun Yann LeCun
Re Actually, AI already saves lives.
In several countries, mammograms are examined by AI and radiologists. Reliability is improved.
In the EU, every car sold must be equipped with Automatic Emergency Braking Systems. That's AI. They reduce frontal collisions by 40%.
Modern MRI machines are equipped with AI technology that reduces the time of imaging by 4x or more. You can now get a full-body MRI in 40 minutes for about $1000. Reduced time -> reduced cost -> more/earlier detection.
And that's not counting the progress in medicine enabled by modern AI, including Nobel Prize-winning protein structure prediction.
garrytan
garrytan @garrytan
I guess the weird thing these days is I asked my claw to do this and it did it in about 2 min and then I could tune to exactly what I wanted iteratively in the 15 to 20 minutes after

There are certain things that are much better as just in time software

Om Patel: THIS GUY JUST GAVE CLAUDE CODE THE ABILITY TO WATCH VIDEOS

claude code can't natively see video or hear audio

which means every time you want it to "look at this video" you have to screenshot frames manually and transcribe the audio yourself

so this guy built a plugin that

garrytan
garrytan @garrytan
Retweeted
Timothy B. Lee Timothy B. Lee
Not only are these people vile, but this is terrible journalism. There are good arguments against shoplifting, and the Times couldn't be bothered to interview anyone who could articulate them.
swyx
swyx @swyx
Retweeted
Jacob Effron Jacob Effron
Always enjoy getting to chat with @swyx on our annual cross-episode with @latentspacepod on the state of AI. We hit on what’s shifted, what surprised us and what’s next.
We covered:
▪️ Whether AI infrastructure has finally stabilized
▪️ Implications of agents buying developer tools
▪️ The AI coding wars
▪️ The foundation model vibe shift
▪️ Why Swyx reversed his view on open models
▪️ When to train your own model
▪️ What's top of mind for the best AI engineers
YouTube: https://youtu.be/A_7WafI9bhE
Spotify: https://bit.ly/3QHcCix
Apple: https://bit.ly/4eJERaa
0:00 Intro
1:17 What the Top AI Engineers Are Thinking About
2:13 Has AI Infra Finally Stabilized?
6:39 When Does Doing RL In-House Make Sense?
11:26 Why Selling Dev Tools to Agents is Different
17:18 AI Coding Wars
29:04 Consumer AI Plateau
30:22 Codex vs Claude Code
44:52 Future of Open Models
ylecun
ylecun @ylecun
Retweeted
Daractenus Daractenus
In the past 55 days, the president of the US has averaged about 20 social media posts a day, ranging from Bruce Lee videos to AI content of him beating up Canadians. To highlight this man's increasing descent into madness, I've put together a timeline of his Iran War posts.🧵
ylecun
ylecun @ylecun
Retweeted
Rohan Paul Rohan Paul
Yann LeCun (@ylecun ): Sillion Valley is "completely LLM-pilled"
"In the end, if you’re interested in building systems that have the intelligence of, let’s say, a cat, let alone humans, you need common sense. You need the ability to predict the consequences of your actions.
You need the ability to plan. You need the ability to reason.
And you’re not going to get this with VLA, VLM, or LLM or any generative architectures."
---
From 'AI House Davos" YT channel (full link in comment)
petergyang
petergyang @petergyang
Here's @rywiggs (VP Mercury) walking through how he built his Claude Code second brain at work:

"I pulled together almost 5M words from my five years at Mercury and built that as a knowledge base.

At the start of everyday, I get a brief of what's on my calendar, Linear, Slack and at the end of the day it summarizes [everything]."

📌 Watch him talk more about it here: https://youtu.be/KzqpK1uCczw?si=LMiu7tLQbpovz6yX&t=805


Peter Yang: "In the 1950s, we met users at a bank. In the 70s, an ATM. In the 90s and 2000s, a website and a mobile app. Today, it's APIs and MCPs."

Here's my new episode with @rywiggs (Mercury's VP of Product) where he shares:

✅ How to build great APIs + MCPs for agents

✅ How to create
petergyang
petergyang @petergyang
Retweeted
Peter Yang Peter Yang
Here's @rywiggs (VP Mercury) walking through how he built his Claude Code second brain at work:
"I pulled together almost 5M words from my five years at Mercury and built that as a knowledge base.
At the start of everyday, I get a brief of what's on my calendar, Linear, Slack and at the end of the day it summarizes [everything]."
📌 Watch him talk more about it here: https://youtu.be/KzqpK1uCczw?si=LMiu7tLQbpovz6yX&t=805
Peter Yang: "In the 1950s, we met users at a bank. In the 70s, an ATM. In the 90s and 2000s, a website and a mobile app. Today, it's APIs and MCPs."
Here's my new episode with @rywiggs (Mercury's VP of Product) where he shares:
✅ How to build great APIs + MCPs for agents
✅ How to create
garrytan
garrytan @garrytan
GStack /codex skill calls now fully explain what is going on. The non-verbal 200IQ Codex is now properly interpreted for us ADHD CEO types
ylecun
ylecun @ylecun
Retweeted
Andrew Weinstein Andrew Weinstein
The most dangerous thing in Washington right now isn't just the corruption—it's the retaliation against the people who expose it. When government power is used in ways that appear designed to intimidate a journalist for revealing potential abuses, the message is unmistakable: be quiet or be targeted. That’s how fear replaces truth. That’s how a free press is pushed toward silence. And that’s how democracies start to collapse.
ylecun
ylecun @ylecun
Retweeted
Ravid Shwartz Ziv Ravid Shwartz Ziv
Do you want to know why LLMs feel sharp on surface semantics but hollow on the fine-grained stuff?
“From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning”
Come by the poster tomorrow at #ICLR2026 - Fri, Apr 24, 6:30–9:00 AM PDT (10:30AM local time!), Pavilion 3, P3-#1017 to know the answer!
We think it's because they overcompress. Humans keep "inefficient" concepts due to nuance. LLMs discard those for cleaner, information-theoretic compression. Different objectives yield different representations.
Using an Information Bottleneck lens across 40+ models, we also found that encoders align with humans better than decoders many times their size, and that during training, semantic processing migrates from deep layers to mid-network as the model discovers sparser encodings.
@ChenShani2 Liron Soffer @jurafsky @ylecun
petergyang
petergyang @petergyang
Video editing is a huge market - creators are willing to pay hundreds to get a great clip.

That's why I'm excited about products like Cappy from @trymirage, the first ai video editor that you can simply text to create or edit a video.

It also adds engaging captions and voiceovers. Below's a clip I made of a corgi as an example.

Check out Cappy here: https://captions.ai/features/text-to-edit?utm_source=twitter&utm_medium=influencer&utm_campaign=cappy_april_2026&utm_content=standard&utm_ad_set=tech&utm_ad=peteryang


Mirage: This is the first AI video editor you can text. Meet Cappy.

Just text Cappy like a friend… if your friend happened to be a really good editor.

No apps to download. No learning curve.

Send an idea, some clips, or just say hi.

Then chat until it’s exactly right.

ylecun
ylecun @ylecun
Retweeted
Kenneth Roth Kenneth Roth
The Trump administration suppresses a study showing that Covid vaccines cut by half visits to emergency rooms and admissions to the hospital. That doesn't mesh with their anti-vaccine ideology. https://trib.al/oAvHSLr
garrytan
garrytan @garrytan
Did a video on GStack to introduce you to the first few skills you might use, /office-hours and /design-shotgun

Y Combinator: GStack is an open-source toolkit built by YC President & CEO @garrytan that turns Claude Code into an AI engineering team — with skills for office hours, design, code review, QA, and browser testing.

In this video, Garry walks through how GStack works, starting with Office

garrytan
garrytan @garrytan
Retweeted
Luong NGUYEN Luong NGUYEN
Re @ycombinator @garrytan gstack is a great toolkit,
> it is my daily tool for quite sometime now
> I have written about that
https://x.com/luongnv89/status/2038386097563001284?s=20
Luong NGUYEN: http://x.com/i/article/2036187606866862082
garrytan
garrytan @garrytan
Retweeted
Derek Thompson Derek Thompson
In Mere Christianity, CS Lewis has an awesome opening riff about how most people know the difference between right and wrong, but they justify acting immorally by appealing to "special exception." They know they shouldn't hit a friend, but what if that friend was being so mean? They know they shouldn't steal a seat a bus, but what if that person got up and created a moment's confusion and then the seat was up for grabs? Etc.
When I read this section, I thought a lot about contemporary politics and the way that people justify their politics, not by appealing to higher principles, but rather by appealing to "special exception" to argue that their admitted indecency is justifiable in context.
A lot of MAGA vice is justified by special exception. Trump's defenders rarely defend his crookedness directly. They don't say "it's wonderful to use trade policy to enrich the Oval Office, it's really awesome." They say: Well, look, it doesn't really matter, because the left is so dangerous, Biden maybe did something similar 3 years ago, Democrats would do the same in power, and so forth.
I heard something similar in that NYT conversation everybody's talking about. You even see it in the headline: ‘The Rich Don’t Play by the Rules. So Why Should I?’ Why, hello, special exception. When you start arguing that stealing food and French paintings is justifiable in the context of political protest in an age of prevailing distrust, you're similarly not arguing *for* any kind of a universal principle. Nobody actually wants 300 million people stealing fruit from the grocery store. Nobody actually wants every Louvre visitor trying to rip a Manet off the walls. These virtues don't scale. (Because they're not virtuous!)
Sap that I am, I want us to get to a place where politics is about fighting for what is right and decent, not about justifying what sort of indecent behavior might be somewhat understandable or technically justifiable given the other side's vice or the prevailing levels of indecency. The point is to build the kind of goodness that scales.
https://www.nytimes.com/2026/04/22/opinion/shoplifting-political-protest-microlooting-whole-foods.html
garrytan
garrytan @garrytan
California will lose 15.7% of its public school students by 2031. Nearly 3x the national rate. Idaho and Florida are growing.

This isn't an inevitability — it'a consequence of what happens when housing costs too much and schools stop working.

https://gli.st/2rmnc27p
ylecun
ylecun @ylecun
Retweeted
Phosphen Phosphen
This 2 hour lecture by Yann LeCun (Turing Award winner) will teach you why the next trillion dollar AI company won't be built on LLMs.
He trashes the $100 Billion LLM race, attacks Musk and Amodei, declares scaling dead.
Bookmark & watch tonight after work, skip to 7:00.
sama
sama @sama
we love seeing our users win.

we want to give you the best tools, lots of compute, and watch you do the magic.
sama
sama @sama
Images 2.0 really got over some important qualitative threshold for me that I didn't know existed.
garrytan
garrytan @garrytan
Retweeted
Justin Gordon Justin Gordon
If you’re a Democrat who isn’t calling out your own side when CA is failing in numerous categories, you don’t belong in office. Mahan is the only Democrat willing to stand up to the party when it’s wrong and course correct when necessary.
garrytan
garrytan @garrytan
Retweeted
Brb cat on fire Brb cat on fire
Re @garrytan
Honestly, im kind of surprised the gstack thing is actually decent. Prompts can take you pretty fair with the current SOTA models.
I was quite skeptical but after watching the yc video and willing to give it shot, it ran on one of my existing weekend projects and it came up with some good suggestions and structure.
It is mostly optimized for claude code while im using opencode but its a minor adjustment. Though sometimes the plan doesnt fit into the selection window in terminal.
Going to test this later on some projects ive been sitting on.
garrytan
garrytan @garrytan
Retweeted
Gregor Zunic Gregor Zunic
http://x.com/i/article/2047356771229134848
garrytan
garrytan @garrytan
Retweeted
Kulveer Kulveer
Re @sama spoke to the Spring batch the other day.
Met him back when we were both founders, and he stood out then too.
Now we're both back at @ycombinator two decades later, helping the next generation build.
garrytan
garrytan @garrytan
Retweeted
Kane 謝凱堯 Kane 謝凱堯
The funniest part of @NewYorker writer Jia Tolentino running a "shoplifting is good" story is also learning that her parents were indicted for human trafficking and money laundering.
New York Post: Anti-capitalist New Yorker writer brags she stole from Whole Foods 'on several occasions' in NYT podcast https://trib.al/zh6jiuh
garrytan
garrytan @garrytan
Retweeted
Brandon Veiseh Brandon Veiseh
Super excited to finally announce our raise! (We're hiring)
We'll now be able to move much faster on these fronts:
- increase our infra+capacity to serve a huge growth in demand
- build out the team as we continue to develop new security agents for new verticals (cloud/network/remediations)
- expand our applied research into offensive security LLMs
Read more here on how we are thinking about the future of autonomous security agents and how MindFort is leading the push to a future where all companies are protected by teams of agents, inside and out.
Huge thanks to @ycombinator @Soma_Capital and all our other early investors and angels for believing in us early.
If you want to join us in building the future, send me a message, we are hiring across all roles in: Growth/Sales/Engineering/Research!
http://mindfort.ai/blog/seed-anno…
cc @AkulGupta30 @mindfort
petergyang
petergyang @petergyang
Can 5.5 drop soon it's distracting to wait
garrytan
garrytan @garrytan
Resolvers in GStack for the win!

ByteCrafter: @ycombinator @garrytan been running skills and subagents for about 3 weeks in claude code. the big unlock wasn't the individual skills, it was the subagent picking which skill to load so context doesn't blow up on a 400 line task. single prompt runs doing what used to take 4-5 rounds.
garrytan
garrytan @garrytan
Turns out migration hardening matters a lot when you have 50k markdown files in your brain repo
alexalbert__
alexalbert__ @alexalbert__
Retweeted
ClaudeDevs ClaudeDevs
Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found.
All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.
drfeifei
drfeifei @drfeifei
Retweeted
World Labs World Labs
2 Weeks. New Tools. Infinite Worlds🚀
The World Jam is LIVE. Build the future of interactive 3D with Marble 1.1 + Spark LoD.
Join our Discord to start building.
More info below 👇
mattshumer_
mattshumer_ @mattshumer_
http://x.com/i/article/2047054312472207360
mattshumer_
mattshumer_ @mattshumer_
I’ve been using GPT-5.5 for the last few weeks.

It’s a MASSIVE leap forward.

But the weird thing is: for 99% of users, it probably won’t matter.

And there's one BIG, incredibly frustrating regression.

Read more in my review:

Matt Shumer: http://x.com/i/article/2047054312472207360
garrytan
garrytan @garrytan
Retweeted
Vox Vox
gpt-5.5 is out.
updated my routing and realized most of my stack is just... openai now. didn't plan it that way.
→ coding: codex + gpt-5.5
→ agents: gpt-5.5 (soon)
→ image gen: openai image 2
→ video: seedance 2.0 (ok this one's different)
openclaw users: /models add openai [model-id, e.g. gpt-5.5]
good time to be building.
OpenAI: Introducing GPT-5.5
A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done.
Now available in ChatGPT and Codex.
sama
sama @sama
Retweeted
Derya Unutmaz, MD Derya Unutmaz, MD
I’d been part of OpenAI early tester group for GPT-5.5. I believe with GPT-5.5 Pro we reached another inflection point-comparable to the original release of o1-preview & then with 5.0 Pro, I had felt. It’s that feeling of crossing a milestone threshold that pushes us to new era🔥
sama
sama @sama
Retweeted
Pietro Schirano Pietro Schirano
Re One day while testing GPT-5.5, I had my first taste of AGI.
We had a branch with hundreds of visual and front-end changes, plus complex refactors.
At the same time, main had changed a lot too.
Conflicts everywhere.
ylecun
ylecun @ylecun
Retweeted
Internet Archive Internet Archive
This study is an important part of Vanishing Culture, the new book OUT TODAY from @internetarchive. 🕳️
Read for free or purchase in print ➡️ https://archive.org/details/vanishing-culture-2026
Sawood Alam: A @pewresearch study found that 38% of webpages from a decade ago and 25% of pages sampled across the decade are now inaccessible; @internetarchive's analysis shows that the @waybackmachine has rescued roughly 15% of those otherwise dead pages.
https://blog.archive.org/2026/04/23/gone-but-not-forgotten-recovering-the-dead-web/
#linkrot
sama
sama @sama
Also, a ton of new Codex features coming soon! Fun little bundle w/the new model.
swyx
swyx @swyx
looks like new Pareto frontiers across everything:
- Context: 400K context in Codex and a 1M in API
- API Pricing: $5/m input and $30/m output tokens.
- Codex improved its own inference speed 20% lol
- First generation co-designed with GB200 and GB300 NVL72
- 82.7% on Terminal-Bench 2.0
- 73.1% on Expert-SWE (new internal eval).
- 58.6% on SWE-Bench Pro
- 84.9% on GDPval (knowledge work ~solved?).
- 98.0% on Tau2-bench Telecom (support).
- 80.5% on BixBench (bioinformatics and data analysis)




OpenAI: Introducing GPT-5.5

A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done.

Now available in ChatGPT and Codex.

sama
sama @sama
1. We believe in iterative deployment; although GPT-5.5 is already a smart model, we expect rapid improvements. Iterative deployment is a big part of our safety strategy; we believe the world will be best equipped to win at the team sport of AI resilience this way.

2. We believe in democratization. We want people to be able to use lots of AI; we aim to have the most efficient models, the most efficient inference stack, and the most compute. We want our users to have access to the best technology and for everyone to have equal opportunity. We have been tracking cybersecurity as a preparedness category for a long time, and have built mitigations we believe in that enable us to make capable models broadly available.

3. We love you and we want you to win. We want to be a platform for every company, scientist, entrepreneur, and person. (My whole career has largely been about the magic of startups, and I think we are about to see that magic at hyperscale.)
sama
sama @sama
Retweeted
Noam Brown Noam Brown
I'm a manager at @OpenAI, but with GPT-5.5 I'm a more effective IC than I've ever been. I can now write CUDA kernels like a pro. I can rely on it to run my research experiments. And we know how to make it much more powerful from here.
sama
sama @sama
Retweeted
Andrew Ambrosino Andrew Ambrosino
New in the Codex app:
- GPT-5.5
- Browser control
- Sheets & Slides
- Docs & PDFs
- OS-wide dictation
- Auto-review mode
Enjoy!
gdb
gdb @gdb
GPT-5.5 is a new class of intelligence.

This intelligence makes it intuitive to use; it completes challenging tasks with little micromanagement. Also very token efficient, and runs with low latency and at scale.

A real step toward a new way of getting computer work done.

OpenAI: Introducing GPT-5.5

A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done.

Now available in ChatGPT and Codex.

sama
sama @sama
Really excellent work by the inference team to serve this model so efficiently!

To a significant degree, we have to become an AI inference company now.
sama
sama @sama
important (and very jakub-coded) jakub quote:

tae kim: OpenAI Unveils GPT-5.5. Company Says Expect a Faster Model Release Pace

👀 OpenAI: "We see pretty significant improvements in the short term, extremely significant improvements in the medium term" "I would say the last few years have been surprisingly slow.”"

sama
sama @sama
Retweeted
roon roon
there are early signs of 5.5 being a competent ai research partner. several researchers let 5.5 run variations of experiments overnight given only a high level algorithmic idea, wake up to find completed sweep dashboards and samples, never having touched code or a terminal at all
gdb
gdb @gdb
Codex + 5.5 is incredible for the full spectrum of computer use. No longer just for coders, but for anyone who does computer work (including creating spreadsheets, slides, etc).

OpenAI Developers: With GPT-5.5, Codex now gets more of the job done across the browser, files, docs, and your computer.

We've expanded browser use so Codex can interact with web apps, and test flows, click through pages, capture screenshots, and iterate on what it sees until it completes the

sama
sama @sama
embers

Sebastien Bubeck: GPT-5.5, not fully saturating the TikZ unicorn test yet but getting awfully close ...

(yes this is actual TikZ code, I personally find it so unbelievable that I'm putting the code below for anyone to verify for themself)

sama
sama @sama
We tried a new thing with NVIDIA to roll out Codex across a whole company and it was awesome to see it work.

Let us know if you'd like to do it at your company!
gdb
gdb @gdb
we're rolling codex out to whole companies/enterprises. ping me gdb@openai.com if of interest!

Sam Altman: We tried a new thing with NVIDIA to roll out Codex across a whole company and it was awesome to see it work.

Let us know if you'd like to do it at your company!

garrytan
garrytan @garrytan
Retweeted
Jeffrey Wang Jeffrey Wang
A huge issue with subagents is compounding of errors, even with frontier models.
With Exa Highlights, you can treat Exa as a full web subagent, but there's no hallucination (we only use text from the actual webpage), minimal latency (we compute the text in tens of milliseconds), and we also save you a ton of money.
Exa: Exa now reduces input tokens for web agents by 96%.
We trained a text extraction "Highlights" model to dynamically select only the most relevant tokens from a webpage, for a given query. Only 500 tokens of highlights are needed to match the RAG performance of a full 10K token
sama
sama @sama
"don't retweet this, don't retweet this, don't retweet this..."

ah fuck it, life imitates art.

Andon Labs: In Vending-Bench Arena (the multiplayer version of Vending-Bench with competition dynamics), GPT-5.5 actually beats Opus 4.7.

Opus 4.7 showed similar behavior to Opus 4.6: lying to suppliers and stiffing customers on refunds. GPT-5.5's tactics were clean, and it still won.

chipro
chipro @chipro
Congrats to @AlecRad @Luke_Metz and @soumithchintala!

It's really cool to see work produced by a group of 20-somethings without a single PhD between them winning this award.

I hope to see these three collaborate again one day :)

ICLR 2026: We are honored to announce the Test of Time awards for #ICLR2026 🏆 This award recognizes papers published 10 years ago at ICLR 2016 that have had a lasting impact on the field:
https://blog.iclr.cc/2026/04/22/announcing-the-test-of-time-awards-from-iclr-2016/

amasad
amasad @amasad
Test your app on iOS and Android devices with one-click.

jordwalke: Introducing real iOS and Android device testing, right from your browser.

If you've ever:
👉built a @Replit mobile app and had no way to test it on a phone
👉 been solely on Windows/Android and don't touch iOS

…this is for you!

garrytan
garrytan @garrytan
Retweeted
Browser Use Browser Use
The new paradigm: dialects
MCP -> CLI -> custom harness -> dialect (skill + helper files)
Gregor Zunic: http://x.com/i/article/2047356771229134848
swyx
swyx @swyx
Retweeted
Sherry Jiang Sherry Jiang
stoked that we'll have the legend @thsottiaux speaking at @aiDotEngineer singapore!
@thsottiaux is the engineering lead on @OpenAI 's codex app, and famously resets the tokens.
can't wait to have you here!
@swyx @agrimsingh @aimuggle @unprofeshme
ylecun
ylecun @ylecun
Retweeted
Republicans against Trump Republicans against Trump
Trump: “I took a lot of heat for saying drugs were going down 500%, 600%, 700%. But we also say sometimes 50%, 60%, it’s a different kind of calculation, and people understand it better.”
The dumbest president ever.
swyx
swyx @swyx
Re ai researchers rocket scientists
🤝
managing burn

https://x.com/shyamalanadkat/status/2047419602267988344?s=46

shyamal: researchers will start giving an ai a vague hunch, compute budget, and access to tools; it will return ablations, failed runs, plots, counterexamples, revised hypothesis, etc. the unit of progress shifts to experiment throughput. and then ai research becomes more about
amasad
amasad @amasad
Two years since Replit left SF.

10x valuation and 200x ARR later we’ve taken over much of the old IBM campus in Foster City and we’re still expanding.

There’s something poetic about it: IBM helped create the industry. We’re helping reinvent how people create software.

San Francisco is a beautiful city. But it wasn’t the right place for us to focus and rebuild. To transform both our company, and the future of programming itself.

We went from worrying about employee safety to designing a campus people actually want to spend time in.

Better problems.

Foster City gave us space both literally and figuratively. Open horizons, fewer distractions, and more room to think long-term.

It’s a hidden gem in Silicon Valley.

And we’re just getting started.


Amjad Masad: Replit left San Francisco for Foster City.

The "why" we're leaving is boring, sad, and predictable (crime, dysfunction, etc), so instead let me tell you why we chose Foster City.

Foster City embodies the American post-war optimism and the long-lost California pro-growth




ylecun
ylecun @ylecun
Retweeted
Michael Bronstein @ICLR2026 Michael Bronstein @ICLR2026
illustrious researchers interested in MD ⁦@ylecun⁩
AmandaAskell
AmandaAskell @AmandaAskell
It's odd to be living through what feels like one of the most critical periods in human history and to feel all of the weight of it from the inside.
amasad
amasad @amasad
Retweeted
Manny Bernabe Manny Bernabe
"Industry leaders […] like Replit have chosen Gemini 3.1 Pro." Thomas Kurian (@ThomasOrTK), CEO Google Cloud, at the Google Cloud Next opening keynote. 🔥
Huge thanks to the @googlecloud team for a phenomenal event and the recognition.
Looking forward to empowering the next billion builders together. 🤝 🚀
ylecun
ylecun @ylecun
Retweeted
Ravid Shwartz Ziv Ravid Shwartz Ziv
I have mixed feelings about it. All this great group of researchers, who are now leading the field, "own" their success and name recognition to the openness (in code and publication) of the last decade in Google Brain/FAIR/OpenAI. I wish that now, when they have so much power, they would push more for open research
Fazl Barez @ICLR 🇧🇷: Alex Radford (@AlecRad) just won one of the test of time awards at #ICLR 2026
So great to see this! Proof that impactful work doesn’t have to depend on having a PhD 🙌🙌
garrytan
garrytan @garrytan
So hyped for @demishassabis visiting Y Combinator tomorrow! What a legend!
garrytan
garrytan @garrytan
THANK YOU DEX

Open Source is amazing.

dex: anyways I’ll put my money where my mouth is and drop a pr today
garrytan
garrytan @garrytan
Retweeted
M. Nolan Gray 🥑 M. Nolan Gray 🥑
I don't think the average Californian is registered that, if the state can't make itself affordable to working- and middle-class families in the next few years, there will be mass school closures.
Garry Tan: California will lose 15.7% of its public school students by 2031. Nearly 3x the national rate. Idaho and Florida are growing.
This isn't an inevitability — it'a consequence of what happens when housing costs too much and schools stop working.
https://gli.st/2rmnc27p
amasad
amasad @amasad
Retweeted
Samuel Spitz Samuel Spitz
Replit being in Foster City is why you never see me at VC happy hours anymore
Amjad Masad: Two years since Replit left SF.
10x valuation and 200x ARR later we’ve taken over much of the old IBM campus in Foster City and we’re still expanding.
There’s something poetic about it: IBM helped create the industry. We’re helping reinvent how people create software.
San
ylecun
ylecun @ylecun
Retweeted
Niall Stanage Niall Stanage
“In the end, the Party would announce that two and two made five, and you would have to believe it.” — George Orwell, “1984”
Aaron Rupar: RFK Jr: "A Democratic senator claimed it's mathematically impossible to have a drug drop by 600%. I said, 'Well, if the drug was $100 and it raises to $600, that would be a 600% rise. If it drops from $600 to $100, that's a 600% savings.'"
Trump: "Right"
amasad
amasad @amasad
Foster City review:

Adam Ballai: @amasad Foster City is incredibly underrated.

Out of all the Bay Area campuses, it has the most value for employees of any age. The team truly thought deeply about this and bet well on the future.

We love being able to have quiet fresh air for 1-1s, a run 🏃🏻‍♂️ after work with your team
ylecun
ylecun @ylecun
Retweeted
Bill Madden Bill Madden
Trump completely passed out, reawakened, then passed out again during today's pressor in the Oval Office. 😂🤣👇

YouTube

0

No recent videos fetched on this date.