← 2026-04-14

Daily Edition

2026-04-15

2026-04-16 →

AI Builders 日报 — 4月15日

追踪 AI 领域真正在做事的人,而不是空谈者。

今日思考

04-15 是前沿实验室发布节奏之间的"场间日",但暗线在同步收紧——

Coding Agent 的"编排层"正在成型。 swyx 的总结最直白:2025 是 subagent 年(优化问题),下一站是 agent-of-agents(能力问题)。Windsurf 2.0 当天发布的 Spaces 概念落地了这个判断——统一管理 agent、把工作交给云端 Devin,"合上笔记本 agent 继续 ship"。Dan Shipper 转的 Kieran Klaassen 给出更具体的工程分工:Brainstorm 和 Polish 留给人,Plan / Code / Review / Test / PR 全自动化。

OSS 的信任危机和自我重塑同步发生。 Bailey Pumfleet 喊 "open source is dead" 触发了一轮反扑:Peter Steinberger 用一条长推展开 OpenClaw 这四个月做的事——Docker/OpenShell sandbox + allow-list + per-access exec 提示,上百位安全研究员 pen-test 过;他同时承认 GPT-5.4-Cyber 对闭源逆向的能力"坏消息告诉你"。HF CEO Clem Delangue 从另一头反驳"OSS 是网络安全威胁"的叙事。Amjad Masad 给了一个很工程化的答案:GitHub 应该在 star 旁边挂一个"🦾 $239M"——显示这个包被投入过多少算力去加固。信任如果不能靠自证,就靠指标。

GPT-5.4 Pro 解掉 Erdős Problem #1196。 Jared Duker Lichtman(原命题证明者)的类比说得好——这不是炮轰:"国际象棋主流开局被研究透了,但 AI 发现了一条因人类美学偏好而被忽略的开局线。" AI 在数学上的"原创性"从话术变成了具体案例。


产品与发布

Windsurf 2.0(Spaces)

Windsurf 发布 2.0,核心是 Spaces:统一管理所有 agent、把工作委托给云端 Devin,"合上 laptop agent 继续 ship"。swyx 作为 cog 顾问参与了 Spaces 设计,给出判断——"今年是 subagent 年(优化问题),下一站是 agent-of-agents(能力问题)。" 这可能是 2026 年 coding agent 的架构分水岭。faviconx.com

Replit File Converter

Replit 团队 Samuel Spitz 发布 Replit File Converter,目标是替代那些满屏广告的在线转换网站。Amjad Masad 的批注很到位:"我过去写过无数一次性转换工具——现在只是一项技能了。" 工具型 SaaS 被 agent 原子化。faviconx.com

Shader Lab(basement.studio)

basement.studio 上线 Shader Lab——"shader 界的 Photoshop":可叠层设计、可导出高清素材或 shader、附带 OSS 插拔包。用 Claude Code + threejs + nextjs + Vercel 打造。Guillermo Rauch 的点评切中要害:"每个团队现在都被赋能去建自己的 design factory。" 设计工具的下一站不是更炫的编辑器,是让每个团队自己生成工具。faviconx.com

TurboTax in ChatGPT

Intuit 在 4/15 报税截止日前给 TurboTax 的 ChatGPT App 做了重大升级:自动生成个人化 checklist、上传凭证直接走最大退税路径。Greg Brockman 示范用法。ChatGPT App Store 正在吞掉工具型长尾场景。faviconx.com

Spec-Driven Development 新课(DeepLearning.AI × JetBrains)

Andrew Ng 和 JetBrains 合作、Paul Everitt 授课的短课:"vibe coding 快,但常常做出不是你想要的东西。" 课程教如何写一份完整 spec 定义任务/技术栈/路线图,再让 coding agent 按 spec 迭代。agent 时代的基础技能正式进入教育体系。faviconx.com


观点与判断

Amjad Masad (Replit CEO)

  • OSS 包应该有"硬化投入"指标 Mythos 级模型让找漏洞自动化之后,GitHub 该加一个和 star 并列的指标——每个包被投入过多少算力去加固。Amjad 举例:"📦 linus/linux ⭐️ 200k 🦾 $239M。这是 OSS 能被信任的唯一方式。" faviconx.com

Peter Steinberger (独立开发者)

  • OpenClaw 的安全重塑:四个月、上千工时、上百位 pen-tester 回应 "open source is dead" 声潮,Peter 展开 OpenClaw 这四个月的成果:可选 Docker 或 OpenShell sandbox、allow-list、per-access exec allow/deny 提示。他同时承认对闭源的幻觉不成立——"如果你以为闭源能挡住 GPT-5.4-Cyber 的逆向能力,坏消息告诉你。" faviconx.com
  • "再次被诈骗者震惊" Krill 社区出现仿冒账号骗用户装 ScreenConnect。Peter 提醒:"唯一真正的 Krill 只在服务器里,永远不会 DM 你。" 社区级 social engineering 已成常态。faviconx.com

Dan Shipper (Every / Spiral 创始人)

  • "旧思维造工具,新思维造人格" "old startup thinking: make a tool to solve a problem。new startup thinking: make a personality that can solve problems。" 产品形态从 tool 到 persona 的位移,这是本日最干净的一句。faviconx.com
  • AI 造的问题,解药还是 AI 引用 Shaw 吐槽 vibe coded 代码质量并给出 8-subagent 清理方案的帖子,Dan 的点评:"当 AI 在流程任何一环惹麻烦,解药永远是流程另一端的 AI。Many such cases." faviconx.com

Guillermo Rauch (Vercel CEO)

  • "软件工厂就是产品" 借 Musk 讲 Tesla 的老话搬到软件业——"软件工厂就是产品:open-agents.dev"。每家公司自己的 agent 流水线就是护城河。faviconx.com

swyx (AI Engineer 联合创始人)

  • "一周 vibe code 出来"的神话,反例是 Slack 通知 引用 Nikunj Kothari 对 Slack 通知复杂度的观察,swyx 加了一刀:"那条著名的 Slack 图表本身就是 Slack propaganda,引用它的人有法定义务同时链上 @sophiebits。" 细节决定产品上限,vibe code 的能力边界在明显细化。faviconx.com

Peter Yang (Roblox PM)

  • Claude 稳定性正在成为问题 "感觉 Claude 隔天就有一次 outage——不知道是 ship 节奏太快、compute scaling,还是别的原因。" 前沿模型的可靠性债开始被具体记账。faviconx.com

Yann LeCun (Meta AI 首席科学家) — 转发动态

Yann 本日无原创表态,但放大了两条开源立场观点:

  • HF CEO Clem Delangue 驳 "OSS 是网络安全威胁" "先是说 OSS 会毁灭世界,现在换成 OSS 是网络安全威胁。两种叙事都过度简化——同样的风险闭源里一样存在。" faviconx.com
  • Nathan Lambert 的开源模型九条信念 第一条就是今年最重要的开源侧判断之一:"顶级闭源模型对开源模型的能力优势并没有如预期般扩大。" 整份信念清单是开源侧经济/能力/分发/政策因素的系统梳理。faviconx.com

技术动态

Greg Brockman (OpenAI 总裁)

  • GPT-5.4 Pro 解出 Erdős Problem #1196 Leeham 公布结果,Jared Duker Lichtman(原命题相关证明者)给出背景:"这道题我博士做了 4 年,永远在我心里。" Greg 转述 Lichtman 的类比:"最贴切的比喻是国际象棋的主流开局被研究透了,但 AI 发现了一条因人类美学偏好而被忽略的开局线。" AI 的"数学原创性贡献"从话术变成了具体案例。faviconx.com

John Carmack (独立研究者)

  • LLM 对 Internet Archive 做近无损压缩 "Hutter Prize 考的是无损压缩但只有 1GB,PB 量级上 trade-off 会很不一样,而且不一定要 bit-accurate。" Carmack 在问一个工程问题:LLM 能不能作为互联网规模语料的有损压缩器?这条值得认真想。faviconx.com

Peter Steinberger — 转发 Shopify Engineering

  • pi-autoresearch 在 Shopify 的试产数据 Shopify Engineering 官方发布:上线 pi-autoresearch 后,unit tests 快 300x、React 组件挂载快 20%、CI 构建时间缩短 65%、pnpm run 也更快。"Autoresearch 永远不停尝试你没空尝试的方案。" 开源 + agent 大规模优化代码库的真实数据终于有了。faviconx.com

Jeremy Howard — 转发 TeraflopAI

  • SEC EDGAR 全量开源到 HuggingFace TeraflopAI 联合 @johngfriedman 和 @daftengine 把 SEC EDGAR 所有主要文件全部免费上传到 HuggingFace。背景是"美国 AI 开源环境日益收紧,公开数据集的发布比以往任何时候都重要"。金融监管全量数据的 AI 可用化完成关键一块。faviconx.com

X / Twitter

51
Garry Tan
Garry Tan @garrytan
GStack is not just my batmobile, I made it so it could be yours too

Siddharth: again just a reminder @garrytan is batman. and gstack is the batmobile
Garry Tan
Garry Tan @garrytan
NO RAGRETS

MyName: @garrytan “Dress like a lobster in all things.”— 1 Tan 4:20

swyx
swyx @swyx
finally: @simonlast + @sarahmsachs on Latent Space!

Notion has rebuilt Notion AI 5 times. This is the first time Simon has told the entire story.

I've been trying to do this interview for ~3 years. We run @latentspacepod on Notion since inception, as does every other top tech company. Notion is one of the top ~3? knowledge work tools in the world, crossing 100M users in 2024 and now shipping the AI productivity suite that @ivanhzhao wants to be "steel and steam for organizations" — the backbone of a new Industrial Revolution of Infinite Minds that will change the world.

Latent.Space: 🆕 The Full Story of Notion AI

https://latent.space/p/notion

We're so excited to chat with @simonlast and @sarahmsachs about Notion's "Token Town" - the crack team of AI Engineers and Model Behavior Engineers entrusted with building AI for Silicon Valley's most beloved knowledge work
steipete
steipete @steipete
Retweeted
Paul Solt Paul Solt
OpenAI shipped GPT-5.4-Cyber. A model built to find and fix software exploits.
More capable than Mythos… and available today.
1. Binary scanning. Agents can find exploits in compiled apps… no source code required. That’s a new attack surface.
2. Prompt Refusals are lower. Verified defenders get a more permissive model than the public version.
3. Access is tiered by identity. Individuals verify at http://chatgpt.com/cyber. Enterprises go through a rep.
4. Codex Security has fixed 3,000+ critical vulnerabilities automatically.
5. They’re scaling to thousands of verified defenders.
The binary scanning unlock is scary. Stuff like this hasn’t been mainstream before.
Agents finding exploits without ever seeing your source code.
OpenAI: We’re expanding Trusted Access for Cyber with additional tiers for authenticated cybersecurity defenders.
Customers in the highest tiers can request access to GPT-5.4-Cyber, a version of GPT-5.4 fine-tuned for cybersecurity use cases, enabling more advanced defensive workflows.
ylecun
ylecun @ylecun
Retweeted
Neil deGrasse Tyson Neil deGrasse Tyson
If current cuts to the Federal Science Budget proposed by the White House are approved by Congress, that will far-and-away be the largest cut to science since the United States began funding it.
Republicans & Democrats alike know there’s no surer path to Making America Not-Great
swyx
swyx @swyx
Re i didnt realize this lazy no effort post did so well… lol… chalk another W up to @josephofiowa’s Maps Theory of Twitter
gdb
gdb @gdb
GPT-5.4 Pro for making beautiful contributions to mathematics:

Leeham: GPT-5.4 Pro solves Erdős Problem #1196!

Very pleased with this result; definitely my favourite thus far! This problem has been thought about for some time which makes this reasonably impressive and meaningful (see Lichtman's comments below).

Formalisation is underway!


ylecun
ylecun @ylecun
Retweeted
Gianl1974 Gianl1974
Hey Republicans;
He pardoned 1,600 violent criminals. You said nothing.
He bulldozed the East Wing. You’ve said nothing.
He’s interfered with the release of the Epstein files. You’ve said nothing.
He took over the Kennedy Center, even renaming it after himself. You’ve said nothing.
He accepted a $400 million airplane as a personal gift. You’ve said nothing.
He’s threatened Canada, Cuba, Denmark, Greenland, Venezuela, Colombia, Brazil. You’ve said nothing.
He’s tariffed just about everyone but Russia, causing inflation and instability worldwide. You’ve said nothing.
He attacked a nation during mediated negotiations. You said nothing.
His ill-conceived war killed 175 little girls in its first days. You’ve said nothing.
He’s alienated and insulted more countries than I can keep track of. You’ve said nothing.
His ICE Army is terrorizing and murdering U.S. citizens. You've said nothing.
He has committed murder on the high seas. You've said nothing.
He's co-opted the Justice Department and directed it to prosecute his political enemies. You've said nothing.
You've not only said nothing to all of these egregious acts, and many more, but you have also enabled them.
And it’s only been a year.
Hey, Republican Congressmen, you took an oath, remember?
Not to him. To the Constitution.
It’s time to do your fucking jobs!
amasad
amasad @amasad
I built so many one-off conversion apps — now it’s just a skill.

Samuel Spitz: Introducing Replit File Converter

Skip the sketchy ad-riddled file conversion websites.

Now, you can just use Replit instead.

gdb
gdb @gdb
try the TurboTax app in ChatGPT:

Intuit: ⏰ Beat the 4/15 tax deadline. TurboTax in @ChatGPTapp just got an upgrade in time for tax day - get a personalized tax checklist and upload docs to help you maximize your refund when you file with @TurboTax. 💸🤖 https://bit.ly/3OBPWzA
swyx
swyx @swyx
btw the famous slack chart is slack propaganda and everyone who cites it is legally obligated to also link to @sophiebits


Nikunj Kothari: Every time I see a tweet saying “I can vibe code this in a weekend” - I think of the slack notification system..

It takes time, persistence and effort to get the details right.

Sure, a lot of simple workflows will get vibe coded away. And maybe you can put this in Claude Code

steipete
steipete @steipete
Retweeted
Eleanor Berger Eleanor Berger
The OpenAI $100 pro plan is frustrating. I got it with the intention of mostly using it for agentic non-coding stuff (I use my GitHub Copilot Pro+ for coding), but because the quota is so generous I don't manage to utilise much of it. And that makes me feel like a loser. 😢
danshipper
danshipper @danshipper
Retweeted
Monologue Monologue
Coming next week!
Naveen Naidu: One of the best apps I saw on Apple Watch (I’m biased)
steipete
steipete @steipete
Retweeted
Lex Lex
KYC on a subscription service is a death sentence for consumer products. Most view it as a severe overreach.
Anthropic making unexplainable decisions rn lmao
Would you hand over your gov id to use Claude code?
玩个锤子: 注意了
现在开 claude max 有概率需要 kyc 咯
进一步封锁。。。
steipete
steipete @steipete
Retweeted
Onur Solmaz Onur Solmaz
You need to understand one fact about OpenClaw
People are biased and incentivized to spread disinformation about OpenClaw. That is because OpenClaw IS NOT PUMPING ANYONE’S BAGS, unlike most other projects
Literally every other for-profit agent product is incentivized to trash OpenClaw, BECAUSE OpenClaw is a neutral third party across the industry and geopolitical scene. They MAKE MONEY when OpenClaw loses
OpenClaw does not worry about making money for some investors. Its founder @steipete is a successful exited founder. He is motivated by having fun and democratizing AI, literally. That is why he is suddenly so loved by everyone. He cares about PEOPLE, not MONEY
“OpenClaw is bloated”
-> Since beginning of March, OpenClaw is thinning its core and putting functionality in plugins behind a plugin SDK. Having numerous plugins to choose from does not mean bloat. This was already copied by others and is still a work in progress
“OpenClaw is not secure”
-> OpenClaw has the most eyeballs and immediately addresses any security advisories as soon as they come. It is the most secure agent, by sheer pressure
“OpenClaw is bought by OpenAI”
-> Then why is my bank account so empty bro??? All maintainers are literally unpaid and working DOUBLE beside their dayjobs to ship features to you. Do you think VC money can buy that kind of commitment?
Once you understand these facts, you’ll like OpenClaw even more. Because OpenClaw is your AI, People’s AI
And you can join us too. OpenClaw is the easiest-to-join project in AI right now. You just need to start using it, and start making good contributions. If you are competent, you can become a maintainer, and join the rest of the team making history!
jeremyphoward
jeremyphoward @jeremyphoward
Retweeted
Matt Pocock Matt Pocock
Been waiting a month for Anthropic to answer a simple usage question about Claude Code subscriptions
Have I been ghosted
Matt Pocock: Can I get some questions answered by someone at Anthropic?
1. Can you use an OAuth token generated from a subscription to power the Claude Agent SDK strictly for using Claude Code in a local dev loop?
All I want is a more reliable API for parallelizing multiple Claude Code's.
steipete
steipete @steipete
Retweeted
Nimrod Gutman Nimrod Gutman
Codex installed new firmware on my super old HP Color LaserJet MFP M277dw (after scanning stopped working)🥹
I love this new world.
steipete
steipete @steipete
once again, I’m amazed by scammers.

Shadow: Fake Krill accounts are going around, one of them was trying to get people to install ScreenConnect
Don't fall for it, the only real Krill is the one in our server and it will never DM you :)

danshipper
danshipper @danshipper
old startup thinking: make a tool to solve a problem

new startup thinking: make a personality that can solve problems
danshipper
danshipper @danshipper
when AI causes problems in any part of a process, the solution is always AI at the other end of the process

many such cases

Shaw (spirit/acc): The quality of your vibecoded slop is horrible. I've seen it. Absolute dogshit.

Fortunately, there is a fix.

Use this prompt:

I want to clean up my codebase and improve code quality. This is a complex task, so we'll need 8 subagents. Make a sub agent for each of the following:
steipete
steipete @steipete
Retweeted
Mario Zechner Mario Zechner
Re hilarious. @threepointone Forky McForkyface strikes again. on an organizational level.
petergyang
petergyang @petergyang
I recently came back from a 2-week trip to China and it was eye-opening to see how the world's second largest economy operates.

I think every product builder should visit at least once to understand:

→ Chinese AI work culture
→ Electric vehicles, $2 delivery, and more
→ How people in China live and work

📌 Here are 15 observations from my visit: https://creatoreconomy.so/p/15-observations-on-work-and-life
rauchg
rauchg @rauchg
The software factory is the product: http://open-agents.dev

Elon Musk: @skorusARK The factory is the product
jeremyphoward
jeremyphoward @jeremyphoward
Retweeted
Vincent D. Warmerdam Vincent D. Warmerdam
It really took me a while to "get it" when it comes to nbdev. But I gotta hand it to @jeremyphoward this way of working makes too much sense once you're used to it.
As of today, I am working on tools that make this kind of work possible in @marimo_io.
https://youtu.be/ZLg27UmAJbw
steipete
steipete @steipete
If you look at GPT 5.4-Cyber and it's ability for closed source reverse engineering, I have bad news for you.

I do very much feel the pain though, there's hundreds of teams that try to poke holes into @openclaw. Our response has been of rapid iteration and code hardening. Which did introduce occasiaonal regression (and yes you all been yelling at me), but I see as the only way forward.

I would be very careful of other open source projects/harnesses that ignore this work and do not publish their advisories. https://github.com/openclaw/openclaw/security/advisories

Bailey Pumfleet: Open source is dead.

That’s not a statement we ever thought we’d make.

@calcom was built on open source. It shaped our product, our community, and our growth. But the world has changed faster than our principles could keep up.

AI has fundamentally altered the security

jeremyphoward
jeremyphoward @jeremyphoward
Retweeted
Enrico Shippole Enrico Shippole
We @TeraflopAI have worked together with @johngfriedman and @daftengine to open-sourced all major filings from SEC EDGAR completely for free on @huggingface. It is now more important than ever to push for open dataset releases.
TeraflopAI: Given the increasingly closed-source nature of the U.S. AI ecosystem, it is now more important than ever to push for the proliferation of open model and dataset releases. Datamule (@johngfriedman), @TeraflopAI, and @daftengine collaborated to release 43 Billion Tokens of SEC
petergyang
petergyang @petergyang
Feels like there's an outage for Claude every other day - I wonder if this is related to the pace of shipping, just scaling compute, or something else?
gdb
gdb @gdb
More on GPT-5.4 Pro’s latest mathematical contribution:

“The closest analogy I would give would be that the main openings in chess were well-studied, but AI discovers a new opening line that had been overlooked based on human aesthetics and convention.”

Jared Duker Lichtman: In my doctorate, I proved the Erdős Primitive Set Conjecture, showing that the primes themselves are maximal among all primitive sets.

This problem will always be in my heart: I worked on it for 4 years (even when my mentors recommended against it!) and loved every minute of it.
amasad
amasad @amasad
If finding security flaws is fully automated with frontier models à la Mythos, then GitHub should have a metric, like stars, showing how much compute is spent securing/hardening an open-source package. Example:

📦 linus/linux
⭐️ 200k 🦾 $239M

Only way OSS can be trusted.
amasad
amasad @amasad
“Where do you run inference?”

“allbirds”

“The shoes?”

“Yea”
garrytan
garrytan @garrytan
Retweeted
Mike Solana Mike Solana
“sir, can you please address the cost of housing”
“sorry best I can do is wage a civil war on behalf of honduran fentanyl dealers”
Tom Steyer: "Wow" is right. ICE is a criminal organization. As governor, I'll prosecute them like one.
rauchg
rauchg @rauchg
This is what the future of design looks like. Not just this specific tool¹, but the fact that every team in the world is now is empowered to build their own 'design factory'.

Shader Lab was built with Claude Code, @threejs, @nextjs, and @vercel. To the exact needs, vision, and specification of @basementstudio.

Every time we work on a project with them, we get a glimpse of an arsenal of internal tools they've deployed. Some built specifically for a project, some more general purpose.

It's now easier to generate software re-assembling powerful building blocks, than searching and procuring the right SaaS for the job.

¹ though it's a banger

basement.studio: gm, today we're launching Shader Lab, like photoshop but for shaders

• design slick layered shader compositions
• export high-quality assets or shaders
• OSS package to plug & play

↳ https://eng.basement.studio/tools/shader-lab

ylecun
ylecun @ylecun
Retweeted
clem 🤗 clem 🤗
Weird how some people always target open-source in AI!

First it was:
“Open-source AI will destroy the world” (spoiler: it didn't and it won't)
Now:
“Open-source is a cybersecurity threat because of AI”
Both narratives are far too simplistic.
The truth is that the exact same risks exist in closed-source systems, often even more so. For example, in practice, APIs can create much bigger data and security vulnerabilities than open systems you can inspect, self-host, and secure yourself.
And as with software more broadly, open-source often ends up more secure because it benefits from far more scrutiny than private internal systems.
The reality is not “open vs closed.”
The reality is that AI is raising cybersecurity stakes across the board, and we need to tackle that seriously together.
danshipper
danshipper @danshipper
everything is coding agent
AndrewYNg
AndrewYNg @AndrewYNg
New course: Spec-Driven Development with Coding Agents, built in partnership with @jetbrains, and taught by @paulweveritt.

Vibe coding is fast, but often produces code that doesn't match what you asked for. This short course teaches you spec-driven development: write a detailed spec defining what to build, and work with your coding agent to implement it. Many of the best developers already build this way.

A spec lets you control large code changes with a few words, preserve context across agent sessions, and stay in control as your project grows in complexity.

Skills you'll gain:
- Write a detailed specification to define your mission, tech stack, and roadmap, giving your agent the context it needs from the start
- Plan, implement, and validate features in iterative loops using a spec as your agent's guide
- Apply the same repeatable workflow to both new and legacy codebases
- Package your workflow into a portable agent skill that works across agents and IDEs

Join and write specs that keep your coding agent on track!
https://www.deeplearning.ai/short-courses/spec-driven-development
danshipper
danshipper @danshipper
Retweeted
Kieran Klaassen Kieran Klaassen
The mistake isn't automating too much; it's not knowing *when* to think.
Building Cora, I found two moments where humans belong in the loop: **Brainstorm** (what to build) and **Polish** (is it actually good?). Everything else – plan, code, review, test, PR – is automated.
`/ce-polish` is the new step I'm adding to Compound Engineering. Check out the branch, run the app, annotate what feels off. Sub-agents fix it while you use it.
Made a mermaid diagram to make it visual – distinct colors for human vs. automated phases. That clarity changed how I think about the whole flow.
Talking through this at the Compound Engineering camp Friday. Join @trevin @danshipper and me👇
amasad
amasad @amasad
Retweeted
Bilal Bilal
People don't get it.
I'm able to get Gemma-4-31B running at 15 tokens per second at a nice walking pace and up to 50 tokens when running now. This is the future of local healthy sustainable AI.
Amjad Masad: “Where do you run inference?”
“allbirds”
“The shoes?”
“Yea”
amasad
amasad @amasad
Retweeted
Temporal Temporal
Replit runs millions of AI agents. The challenge is keeping them reliable once they’re in the wild.
Thread 🧵
steipete
steipete @steipete
Retweeted
Shopify Engineering Shopify Engineering
Since we open-sourced pi-autoresearch, @Shopify teams have been running it on everything.
Results so far:
Unit tests: 300x faster
React component mounting: 20% faster
CI build time: 65% reduction
Made pnpm run faster
Autoresearch never stops trying things you'd never have time to try.
Repo: https://github.com/davebcn87/pi-autoresearch
steipete
steipete @steipete
That was the case in December. 4 months and thousands of work hours later, we have a great security concept; you can go all yolo, use a sandbox (Docker or OpenShell), there are allow-lists and per-access exec allow/deny prompts.

There’s hundreds of security researchers that pen-tested it.

Max Wolter: @steipete @openclaw I don't think OpenClaw is a reference. It literally doesn't have a proper security model. Nothing on OpenClaw is secure by design.
ylecun
ylecun @ylecun
Retweeted
Nathan Lambert Nathan Lambert
I spent some time trying to distill all the complex factors impacting open models -- economics, capabilities, distribution, policy, etc. -- into a clear list of beliefs. Here they are in full.
1. It’s surprising that the top closed models did not show a growing capability margin over open models, based on compute differences for training and research, especially in the second half of 2025 and through today.
drfeifei
drfeifei @drfeifei
Retweeted
Wenlong Huang Wenlong Huang
I recently gave some talks on PointWorld. In this latest version, I discussed: Why world models? Why 3D? Why it matters amidst scaling data in robotics? Why it’s a missing side of the coin for “The Bitter Lesson”?
(It’s more than just a better backbone for training policies)
https://www.youtube.com/watch?v=0vfgm8LshmY
The AI Talks: The recording video is here: https://youtu.be/0vfgm8LshmY
danshipper
danshipper @danshipper
banger

Kevin Kwok: Allbirds is not pivoting. They have always made the commodity hardware startups run in
amasad
amasad @amasad
Retweeted
Replit ⠕ Replit ⠕
Where code becomes culture.
Coming soon.
Join the waitlist → http://vibecon.ai
swyx
swyx @swyx
Retweeted
Reiner Pope Reiner Pope
I chatted with @ysmulki about MatX, chip design and where silicon designed for LLMs is headed
(8:17) Tightly coupling SRAM and HBM on one chip
(14:03) More MoE FLOPS, smaller KV cache load
(16:08) Numerics: from 32-bit to 4-bit
(19:02) Targeting both training and inference
(22:14) Chip timelines
(27:15) Logic and memory scarcity
(29:42) Compute costs
(32:07) Latency: from 20ms to 1ms as the new table stakes
(40:50) Programming the chip
(43:00) Starting MatX
(47:11) Codesign without seeing the models
(51:57) Interconnect design
(55:44) Performance modeling philosophy
(1:07:02) Prefill vs. decode
(1:13:47) What's next
ylecun
ylecun @ylecun
Tired of winning

James E. Clyburn: Trump has abandoned our farmers with his senseless war in Iran. The impact is worst in the South.

Diesel prices are up 46% since the war began. Nitrogen fertilizer is up 30%.

In the South, 81% of farms—including 84% of small farms—didn’t pre-book fertilizer, meaning they’re now

ylecun
ylecun @ylecun
Retweeted
Nirit Weiss-Blatt, PhD Nirit Weiss-Blatt, PhD
"Young, anxious followers, looking for purpose, can be radicalized by apocalyptic AI rhetoric […] The real question is how long the people fueling AI panic expect to avoid responsibility for where that radicalization leads, especially for the most vulnerable."
swyx
swyx @swyx
I've commented that "this is the year of subagents", but that is largely an optimization problem.

the inverse problem - having agents that compose and boss agents that manage/query them - is a capabilities one.

as an advisor to cog, proud to have played a small part in designing the new Spaces concept 3 months ago and today's launch is a start of even more to come. congrats to the team!


Windsurf: Introducing Windsurf 2.0.

Manage all your agents from one place and delegate work to the cloud with Devin - so your agents keep shipping even after you close your laptop.

ID_AA_Carmack
ID_AA_Carmack @ID_AA_Carmack
It is generally frowned upon to have LLMs precisely regurgitate part of their training set, but it is an interesting question how you could use LLM training to nearly losslesly compress a huge corpus like the entirety of the Internet Archive.

The Hutter Prize is for perfect compression, but only one GB. There would be different trades at the PB level, and it gets much more interesting when it doesn’t have to be bit-accurate.
steipete
steipete @steipete
Retweeted
Armin Ronacher ⇌ Armin Ronacher ⇌
Why do people call Gas Town and Beads malware? I can’t put my finger on it. https://github.com/gastownhall/gastown/issues/3649
swyx
swyx @swyx
Retweeted
Astasia Myers Astasia Myers
Best part are retrieval and search details:
>Agent queries differ from human queries & what good results means changes too
>Parallel queries and ranking are both tools to the same outcome
>Top-K precision > positional ranking
>Embedding model choice matters less than you think
Latent.Space: 🆕 The Full Story of Notion AI
https://latent.space/p/notion
We're so excited to chat with @simonlast and @sarahmsachs about Notion's "Token Town" - the crack team of AI Engineers and Model Behavior Engineers entrusted with building AI for Silicon Valley's most beloved knowledge work

YouTube

0

No recent videos fetched on this date.