← 2026-04-21

Daily Edition

2026-04-22

2026-04-23 →

AI Builders 日报 — 4月22日

追踪 AI 领域真正在做事的人,而不是空谈者。

今日思考

Garry Tan 的「SKILLIFY」哲学正在被 Agent 社区广泛认可。Viv(LangChain)专门发长文回应,称之为"self-improving agents 的正确方向";Claude Code 内部也出现了类似的 skillify 机制。这是典型的「自然涌现」——全世界独立发现同一个范式。与此同时,一个危险的信号正在大公司蔓延:用 token 消耗量作为 KPI。这不是在度量 AI 的价值,而是在度量「努力表演」。18 个月后,同一批高管将宣布「AI 没有 ROI」,然后砍掉预算。


产品与发布

Replit + Plaid 金融应用

Replit 宣布与 Plaid 深度集成,用户可以在 5 分钟内构建自己的金融应用。连接账户后,通过自然语言提示即可生成支出仪表盘、AI 财务助手等应用。faviconx.com

Replit Auto-Protect

Replit 推出 24x7 自动安全监控。当你的应用依赖项出现漏洞时,系统会自动准备修复方案并通知你更新——相当于拥有一个永不休息的安全工程师。faviconx.com

Replit for Startups

Replit 启动创业者扶持计划,为在 Replit 上构建的初创公司提供最高 $25,000 的免费积分。目前已有多个基于 Replit 的公司估值超过 $100M。faviconx.com

Claude Cowork 交互式图表

Claude 可以在对话中直接生成交互式图表和 diagram,现已在所有付费计划的 beta 版中可用。faviconx.com

ChatGPT Workspace Agents

OpenAI 推出 ChatGPT Workspace Agents,允许团队构建共享的 Agent 来处理复杂任务和跨工具的长流程工作流。Greg Brockman 评价说「让为团队引入 Agent 能力变得前所未有的简单」。Sam Altman 也表示「大多数公司都会想用它们」。faviconx.com

ChatGPT for Google Sheets

Google Sheets 的 ChatGPT 插件正式发布,用户可以直接在表格中创建新 sheet、跨标签页和公式提问、进行更新。faviconx.com


观点与判断

Amjad Masad (Replit 创始人)

  • 更新诉讼进展 针对国会议员 Randy Fine 的诉讼进入紧急听证会阶段。Fine 试图通过解除屏蔽来逃避案件,但 Masad 强调这不仅是个人的事,而是关于第一修正案的尊重问题。faviconx.com

Garry Tan (Y Combinator CEO)

  • 加州 GPU NIMBY 「GPU NIMBYs 已经失控」——加州拟立法禁止 AI 聊天支持以保护通信 Workers of America 的工作岗位,Garry 批评这是特殊利益集团控制加州的典型案例。faviconx.com

  • Skillify 哲学 「用 OpenClaw 做一次,然后说 SKILLIFY,它就会永远这样做。」Garry 介绍他如何通过这个循环构建 GBrain 和个人迷你 AGI——这是他 50% 的 Agent 编程工作方式。faviconx.com

  • Thin Harness Fat Skills 「Thin Harness Fat Skills Fat Code = THE NEW DRY」——AI 工程师正在形成共识:更好的输出和更长时间运行的 Agent 依赖于使用确定性工具的 Skills + 健壮的回归测试(evals、单元测试、E2E 测试)。faviconx.com

  • Vibe Coding 需要的是品味 「这 100% 正确。如果你了解你的市场、客户和要解决的问题,你有品味,知道什么是好的、什么是烂的,你就能飞。」回应 Gideon Shalwick 关于 vibe coding 不完全取代思考的观点。faviconx.com

  • 与 Claude Code 英雄所见略同 「凌晨 2 点发的关于 SKILLIFY 的帖子居然和 Claude Code 内部的 skillify 机制惊人相似。」——Garry 认为这是自研 AI 工程师正在同步发现相同范式的信号。faviconx.com

Peter Yang (OpenAI 产品设计)

  • Cursor 的全栈困境 「Cursor 必须自己做 post-train,现在又和 xAI 合作获取算力——这说明在竞争激烈的 AI 编程(或一般知识工作)领域,没有全栈控制很难成功。」faviconx.com

  • ChatGPT Images 网页版 bug 「移动端工作正常,但网页版经常忘记自己有图像工具权限,开始生成代码而不是图片。」faviconx.com

Matt Shumer (HyperWrite/agent-s.app)

  • 大公司的 AI 指标正在走向死亡螺旋 「晋升、解雇和绩效评估现在开始用 token 消耗量和连接的 skills/MCPs 数量来决定。更糟糕的是,员工在跑循环烧 token 来假装高产。真正用 2 个 skills 发出 50M tokens 的人反而看起来落后于那个烧了 1B tokens 却什么都没产出的人。18 个月后,同一批高管会宣布『AI 没有 ROI』,然后砍预算。AI 本身工作得很好,只是他们度量的是错误的东西。」faviconx.com

Sam Altman (OpenAI 创始人)

  • Workspace Agents 值得企业采用 「这些很酷。我认为大多数公司都会想用它们。」faviconx.com

swyx (Latent Space)

  • Shopify 的「WTF happened in Dec 2025」 「把这个加到『2025 年 12 月到底发生了什么』的图表列表里。Plot 了整个 Shopify 技术团队的 token 使用量——整个期间他们都有无限 token 预算,但最近某个东西破裂了,斜率在变化,百分位差在扩大,这很令人担忧。」faviconx.com

技术动态

Garry Tan (Y Combinator CEO)

  • Self-Improving Agents 的核心技术路径 Viv(LangChain)在长文回复中详细阐述了 Agent 自研的技术堆栈:Trace + Evals 是 Agent 改进循环的生命线,从 Trace 中分类 Agent 做了什么错误只是第一步,真正的难点是找出错误是什么以及如何以能够长期泛化的方式修复。Skill Learning 是将 Trace 学习的知识编码到 Agent 上下文中的一种很好方式,但需要配合好的上下文工程——否则大量难以区分的 Skills 会导致 Resolver 机制失效,重新陷入上下文rot。她也指出,对于超长视距的编码 Agent(如 Frontier-SWE),需要从目标倒推的架构思维。faviconx.com

X / Twitter

77
AmandaAskell
AmandaAskell @AmandaAskell
When you regain your will to power after a period of burnout or depression.
sama
sama @sama
We want you to have a lot of AI!

Tibo: I don't know what they are doing over there, but Codex will continue to be available both in the FREE and PLUS ($20) plans. We have the compute and efficient models to support it. For important changes, we will engage with the community well ahead of making them.

Transparency
garrytan
garrytan @garrytan
Re OK this one just dropped also

So many bugs to fix
petergyang
petergyang @petergyang
Does anyone from @OpenAI want to share some tips on how to get image 2 to generate good infographics or have it match your brand style?

I haven't had any luck either.

Bojan Tunguz: I tried making an infographic using the GPT-image-2. Lots and lots of visually unacceptable artifacts. :/

amasad
amasad @amasad
Replit testified in support of the BASED Act: Stopping Big Tech from rigging software marketplaces (very unbased).
amasad
amasad @amasad
“Mum, can we have the SpaceX IDE?”

“No we have a space IDE at home”

Space IDE at home:
ylecun
ylecun @ylecun
Retweeted
Kenneth Roth Kenneth Roth
The Trump administration files concocted charges against the civil rights Southern Poverty Law Center, claiming it defrauded donors by supposedly supporting extremist groups when it was paying informants to expose their misdeeds. https://trib.al/TmffK8E
garrytan
garrytan @garrytan
Re It's a bug fix bonanza

petergyang
petergyang @petergyang
The fact that @cursor_ai had to post-train its own model and now partner with xAI for compute I assume shows how hard it is to succeed in the ultra competitive AI coding (or just general knowledge work) space without owning full stack.
garrytan
garrytan @garrytan
Retweeted
benahorowitz.eth benahorowitz.eth
They put my father R.I.P. on a hate group list (insane, because he never hated anybody) and nearly destroyed his non-profit. It turns out that they are the biggest hate group in America. I hope they go to jail forever.
Marc Andreessen 🇺🇸: SPLC was one of the most powerful censorship forces in the country for decades. Lavishly supported by many big American companies for many years. This is astonishing, and deeply concerning.
garrytan
garrytan @garrytan
The GPU NIMBYs are out of control

Dick Lucas 🇺🇸 Running for CA Assembly: Banning AI chat support so a small minority (Communication Workers of America) can keep their jobs at the expense of higher prices for the rest of us. Special interests control CA.

California's anti business nanny state reputation is undefeated. @AsmRickZbur

garrytan
garrytan @garrytan
Retweeted
Agarwal for Congress Agarwal for Congress
"In 2018...Khanna told the tech investors that some Silicon Valley engineers were privileged and more concerned about having their dry cleaning done for them, but the people in Ohio were “hungry.”
Somehow both believable and unbelievable quote from a Congressman about the people in his own district.
Care to comment @RoKhanna ?
https://www.dailykos.com/stories/2026/4/20/800023587/community/c/
garrytan
garrytan @garrytan
Retweeted
Abe Murray Abe Murray
Engineers build the world
(while others debate whether it is possible or preferable)
“Kingsbury implicitly assumes that incomplete understanding means we can't build. The entire history of engineering says otherwise.” -Garry
Love the builder mentality 🦾
Garry Tan: http://x.com/i/article/2045399189606273024
amasad
amasad @amasad
Retweeted
Ryan Mulligan Ryan Mulligan
The face when you, another guy, and the CEO are the the only people in the emacs slack channel at work.
Amjad Masad: “Mum, can we have the SpaceX IDE?”
“No we have a space IDE at home”
Space IDE at home:
garrytan
garrytan @garrytan
Retweeted
Rob Henderson Rob Henderson
"SF...is paradoxically conservative. People want it to remain just as it is...Even Haight-Ashbury, the epicentre of the hippie culture, adopted strict new zoning rules in the 1970s, one result of which is that its black population fell from 40% to only 5%" https://www.edwest.co.uk/p/the-city-of-luxury-beliefs
mattshumer_
mattshumer_ @mattshumer_
Opening another 100 alpha access spots. Last ones for a while.

First come, first serve!

(Oh, and you can use your Claude Max subscription so most tokens are free!)

Matt Shumer: Opening 100 more alpha spots for http://agent-s.app. First come, first serve.

This agent is just insanely powerful. And so damn easy to use.
mattshumer_
mattshumer_ @mattshumer_
Retweeted
Matt Shumer Matt Shumer
Opening another 100 alpha access spots. Last ones for a while.
First come, first serve!
(Oh, and you can use your Claude Max subscription so most tokens are free!)
Matt Shumer: Opening 100 more alpha spots for http://agent-s.app. First come, first serve.
This agent is just insanely powerful. And so damn easy to use.
amasad
amasad @amasad
Update on my lawsuit against Congressman Randy Fine.

He tried to weasel out of the case by unblocking me, but this isn’t just about me. It’s about Fine’s DISRESPECT for the 1st Amendment.

We will not stop until he’s forced to unblock all Americans and respect their rights.


Jenin Younes: We just had a hearing in federal court on our emergency motion in our lawsuit against Congressman Randy Fine on behalf of @amasad. I wrote about what happened in court on Substack- link below

petergyang
petergyang @petergyang
ChatGPT Images works great from the mobile app, but when I try to generate images on @ChatGPTapp web - it often forgets it has access to the image tool and start generating code instead, resulting in "images" like this lol

Seems like a bug please fix.
swyx
swyx @swyx
LS was the first podcast cursor ever did

listen back to baby @amanrsanger when they were 5 people and pre-PMF


Latent.Space: “Cursor is the best product I've used in a while” - @MacCaw

“It's so elegant and easy.” - @AndrewMcCalip

“Coding with AI is getting insane.” - @MckayWrigley

The Latent Space pod is proud to present: the first podcast with @amanrsanger of @anysphere!

https://www.latent.space/p/cursor
gdb
gdb @gdb
wow

adi: A massive pile of rice, on ONE rice grain there is text reading" wOw"

- images-v2 in 4k

garrytan
garrytan @garrytan
My GBrain is becoming autonomous 👀
garrytan
garrytan @garrytan
Retweeted
Agarwal for Congress Agarwal for Congress
.@RoKhanna is so worried about us he's text blasting people all over the country with these lies.
I'm so tired of having to respond to your usual BS, but truth dies in silence, so let's do it.
1. Nobody recruited me
2. Assuming you're referring to Garry's List, @garrytan is a self made man who grew up in poverty in Fremont, and now employs tens of thousands of people and has created billions of value for pension plans. Do you want less or more Garry's?
3. Yes, we raised $400k in four weeks (!!!). Over half our contributions were under $300 (!!!)
4. There's no April 21st deadline. It's a self imposed deadline to drive fake urgency. Spare people your bullsh*t.
5. I don't take PAC money, corporate money, or lobbyist money either. How strongly you emphasize it doesn't change anything.
The new color scheme is nice though!
garrytan
garrytan @garrytan
Retweeted
Aaron Levie Aaron Levie
If you read this and don’t understand why it’s happening it’s an opportunity to reset your understanding of how the real world works.
The real world will need a ton of help actually getting agents going in the enterprise. Companies have legacy tech stacks they need to modernize, data in tons of fragmented tools, knowledge that isn’t captured or digitized, and change management needed to actually utilize agents effectively. And they have to do all this while still running their business day-to-day, unlike startups.
This is why there is so much opportunity for companies (software or services) to actually deploy agents in specific domains and workflows. This remains a big opportunity for both existing services providers but also tons of new startups as well. Every new technology wave produces a new era of consulting firms that can deliver on that technology.
It’s also why the FDE model is going to be alive and well for a long time because companies will want to have their vendor actually help drive the change management and implementation for their new workflows.
The people aren’t going away. Far from it.
First Squawk: OPENAI WORKING WITH CONSULTING FIRMS, INCLUDING ACCENTURE, CAPGEMINI AND PWC, TO HELP SELL CODEX TO BUSINESSES- WSJ
garrytan
garrytan @garrytan
Retweeted
JCat JCat
Yours v1.0 code is officially released. You can now choose either OpenClaw or Hermes as the agent framework for your AI companion, and switch freely between them (OpenClaw ↔ Hermes) during use.
All your AI companion configurations will be automatically updated and migrated accordingly.
Remember to upgrade Hermes or GBrain to the latest version to enjoy their brand-new features 😆
@NousResearch @garrytan
JCat: Just released Yours, an AI companion that deeply remembers user preferences and personality, and evolves continuously over time.
Built on the OpenClaw framework @steipete, its personalized memory is powered by GBrain @garrytan, which serves as the foundational memory system for
garrytan
garrytan @garrytan
“The past was alterable. The past never had been altered.” —George Orwell, 1984

T Wolf 🌁: How it started vs. how it's going. The hypocrisy is astounding. @BettyYeeforCA


garrytan
garrytan @garrytan
Re It's a lot of work to get GBrain to instruct your OpenClaw/Hermes to do the right thing, but it's worth it
garrytan
garrytan @garrytan
http://x.com/i/article/2046866228703363072
garrytan
garrytan @garrytan
Retweeted
Paul Graham Paul Graham
The world is healing, and quite rapidly too. Now US universities would only reject Einstein 11% of the time.
David Rozado: DEI Requirements in Faculty Hiring Have Declined
New HxA Report: The share of U.S. full-time faculty job ads requiring applicants to address DEI fell from 25% in 2024 to 11% in 2025, a 56% relative decline.
Thread 🧵
garrytan
garrytan @garrytan
Basically how I'm building all my features these days: Do it once in OpenClaw, then just run /skillify and it does it like that forever

Garry Tan: http://x.com/i/article/2046866228703363072
garrytan
garrytan @garrytan
Retweeted
Vox Vox
fed gbrain years of my email + calendar 10 days ago. same skillify loop has been running on my own agent since. the agent is learning me. i'm learning my own patterns back.
garry's thesis in one line: turn every failure into a skill with tests that run forever.
works outside agent code. you can wire this into your life.
Garry Tan: http://x.com/i/article/2046866228703363072
garrytan
garrytan @garrytan
Retweeted
rewind rewind
AI agent problem nobody talks about:
> no memory of past failures
> deterministic work done in latent space
> prompt tweaks instead of structural fixes
> right tool exists, agent ignores it and chooses cleverness instead
> skills created but never tested
> resolver table not updated
> two skills overlap
> API changes shape
> orphan skills eat context tokens and never run
> no daily health check
Pattern is always the same:
Agent makes mistake → you fix it in conversation → next session same mistake happens again
Full breakdown of how to turn every failure into a permanent structural fix👇
Garry Tan: http://x.com/i/article/2046866228703363072
garrytan
garrytan @garrytan
Retweeted
Mayank Vora Mayank Vora
Holy shit…Karpathy dropped autoresearch and the internet rebuilt it 40 different ways in weeks.
Someone just cataloged every single fork, port, and descendant in one place.
Here's what the community built on top of it:
→ A macOS fork for Apple Silicon that runs the full loop on M-series chips
→ A Windows RTX version for consumer NVIDIA GPUs with VRAM floor configs
→ A WebGPU port that runs the entire experiment loop in your browser
→ A multi-GPU version with crash recovery and adaptive search strategy
→ A Colab/Kaggle T4 port for people who want to run it for free with zero local setup
Then it got stranger.
People started applying the loop to completely different domains.
→ A trading agent optimizing prompts against rolling Sharpe ratio instead of model loss
→ A genealogy researcher that iteratively expands and verifies family history
→ A Spring Boot service that grew from 119 lines to 950 in 5 autonomous cycles
The original idea was: give an AI a metric and let it self-improve until it wins.
Turns out that idea works on almost anything.
1.1k stars. 100% Opensource.
Repo: https://github.com/alvinreal/awesome-autoresearch
garrytan
garrytan @garrytan
Retweeted
Carlos E. Perez Carlos E. Perez
Garry Tan coins a new word: Skillify. Totally on point that skill development is nowhere close to optimal. Skill development is a new kind of UX design where the user is an AI agent.
Garry Tan: http://x.com/i/article/2046866228703363072
petergyang
petergyang @petergyang
Retweeted
Peter Yang Peter Yang
"In the 1950s, we met users at a bank. In the 70s, an ATM. In the 90s and 2000s, a website and a mobile app. Today, it's APIs and MCPs."
Here's my new episode with @rywiggs (Mercury's VP of Product) where he shares:
✅ How to build great APIs + MCPs for agents
✅ How to create a Claude Code second brain to 2x your productivity at work
✅ What @mercury's data reveals about OpenAI and Anthropic's race for the enterprise
Some quotes from Ryan:
"Don't start with the MCP. Start with the foundation. Build great APIs first."
"I pulled 5M words from my last 5 years of PM work into Claude Code (using QMD search). That's the base of my second brain."
"After meetings, Claude tells me when I did something from my performance review. It keeps me accountable daily."
📌 Watch now: https://youtu.be/KzqpK1uCczw
Thanks to our sponsors:
@WisprFlow: Don't type, just speak https://ref.wisprflow.ai/peteryang
@linear: The AI agent platform for modern teams https://linear.app/behind-the-craft
petergyang
petergyang @petergyang
Re Also available on:

Spotify: https://open.spotify.com/episode/0Qd0u6NYXdTKUTb8iaqcgS?si=QDc8dpLqT3S3EJ17S_75AA

Apple: https://podcasts.apple.com/us/podcast/behind-the-craft/id1736359687?ign-itscg=30200&ign-itsct=podtail_podcasts

Newsletter: https://creatoreconomy.so/p/how-to-build-for-ai-agents-and-a-claude-code-second-brain
swyx
swyx @swyx
Retweeted
Forrest Brazeal Forrest Brazeal
"Funny and distressingly realistic...propelled by awesome characters and inventive twists”— @andyweirauthor
Silicon Valley invents the time machine in my upcoming book PARADOX INC, now available for preorder everywhere!
Here's a look inside from @people: https://people.com/paradox-inc-cover-reveal-exclusive-11955668
ylecun
ylecun @ylecun
Retweeted
Big Brain AI Big Brain AI
Yann LeCun (AMI Labs Founder): "The AI industry is completely LLM-pilled. Everybody is working on the same thing. They're all digging the same trench."
LeCun explains why no lab dares break from the pack:
"They are stealing each other's engineers. So they can't afford to do something different because if they start going on a tangent, they're going to fall behind the other guys. And so they're all doing the same thing."
This groupthink is exactly what drove him out of Meta.
"Meta also became LLM-pilled with sort of recent reshuffling. And it's fine, a strategic decision that maybe makes sense for them. It's just not what I'm interested in."
For @ylecun, the problem runs deeper than strategy.
LLMs are missing something essential about how intelligence actually works:
"I cannot imagine that we can build agentic systems without those systems having an ability to predict in advance what the consequences of their actions are going to be. The way we act in the world is that we can predict the consequences of our actions and that's what allows us to plan."
His broader critique is that the industry has mistaken fluency for intelligence.
Language turned out to be the easy part. The hard part is the physical world.
It's why we still don't have domestic robots or level-five self-driving cars, even though today's systems can pass the bar exam and write code.
garrytan
garrytan @garrytan
Retweeted
Viv Viv
a bunch here where I’m saying ok Garry’s kinda right?! 👀…in some ways :) we’re making this loop much easier to close out of the box soon
If more people get into evals & traces to ground self-improving agents from Garry’s posts, there’ll be no one happier than me
have written about this at length so will also share some linked materials for anyone (including your Clanker) who wants to dig into more details of building evals & self-improving agent systems:
Traces + Evals are the lifeblood of agent improvement loops
We point compute at traces so we can classify what agents did wrong.  Yes, but the hard part is figuring out what the error even was and how to fix it in a way that actually generalizes over time (not play whack-a-mole with if-else statements all the time). Is our agent a bad long horizon planner for X tasks? Should we change the model, or add better planning instructions, or use subagents to isolate context because these types of tasks bloat the main window.
Evals encode the behavior we want agents to have in production. Generating evals from traces is how we figure out how to measure the changes we’re making over time. This is why we lean so hard into Tracing + Evals tooling with LangSmith (more coming soon on making this loop even easier!).
Skill Learning is ONE great Way to Codify Trace Learnings into Context for your Agent
“skillify”/SkillLearning is great, agreed!! (see our /remember youtube video below + blogs on hill climbing coding agents), love that Garry’s discovering Skill Learning from Traces as a mechanism for fixing agent mistakes.  Skills are semantic bundlers so they basically encompass everything needed to accomplish a goal in one folder like instructions and code. This reduces search in aggregating cross-source information. Skills have built-in context engineering with progressive disclosure which helps many users.
Skills are great, I love them and we use them heavily, but just a note that there’s other approaches you can use to fix errors in production trace data. We discuss them briefly below! Remember
Things to think about more deeply:
Context Engineering Still Matters even with Skills & Resolvers
We still need good context engineering!  If you bloat your context window with TONS of skills that are hard for an agent to disambiguate when to use, then the “Resolver” mechanism will suck + you’re back in context-rot world.  “Resolvers” are classifiers of intent, you need to protect your context window and make sure the “rules” in the table are self-consistent over time and also not massively bloating context.
Good context engineering is often a search problem! We need to find the right context and pass it into the computation boundary —> the context window. The better we do that without confusing the agent, the better our results.
Maybe that looks like Skill Search?! Maybe similar skills should get merged or subagents should actually spend more compute doing proper skill research and disambiguation. If we use Skills as the primary agent update mechanism, then we need to think about how this works with context as we use agents across month and year timescales.
Building in Higher-Level Primitives
I love Skill-Learning but often it’s a whack-a-mole- solution if not managed properly. For example, if you wanted to build an ultra-long horizon coding agent (think Factory Missions or something on Frontier-SWE), then you need to think through the harness architecture of how to work backwards from the goal like how to recursively use subagents & planning. Or how to manage & share context in a filesystem. Traces often help you uncover local issues and skills help you solve those, but it’s very important today to think about agent architecture and working backwards from big problems to avoid the potential local minima of Skill Learning. It’s tbd how much compute you need to use to uncover good agent architecture primitives to solve very hard problems. Skill Learning to fix scoped problems is great in the meantime and maybe can get us much further with smarter models.
Evals Alongside/Beyond LLM as a Judge
The hardest part of this all is by far figuring out what actually went wrong across Traces at scale + testing if the proposed fix works over time! Does it work across models? Does it continue to work if you change something else in the system prompt or add another skill? Evals codify the case into an eval that can be detected in realtime (Online Evals/Monitoring). We need to test this stuff, which is why I like using LLM as a Judge that Garry mentions, but there’s much more we can do (programmatic evals, multi-turn cases, containerizing the eval environment to faithfully reproduce what went wrong) - great start, happy to help extend to make your agents better :)
Could write on this for days but I promise you, we’re thinking SUPER hard about primitives for self-improving agents, mining data from Traces, agent-first tooling that makes this possible, and basically any ways we can be helpful to help builders create the best agents in the world.
We have a lot coming soon, reach out if I can help, let’s cook 🚀
Garry Tan: http://x.com/i/article/2046866228703363072
garrytan
garrytan @garrytan
This cycle below is what has replaced 50% of my agentic coding. This is now how I am building GBrain and my own personal mini-AGI with full context on me and the things I care about.

It's not hard. It's quite fun. I do something, anything with OpenClaw, then I say SKILLIFY IT


Garry Tan: http://x.com/i/article/2046866228703363072
mattshumer_
mattshumer_ @mattshumer_
Been hearing wild stuff from folks inside big companies lately.

Promotions, firings, and perf reviews are getting decided by tokens consumed and skills/MCPs connected. That’s the metric. That’s how they’re deciding who’s “good at AI.”

It gets worse. People are literally running loops to burn tokens and look productive. Doing nothing, racking up “usage,” getting rewarded for it.

Meanwhile the person actually shipping with 2 skills and 50M tokens looks like a laggard next to the one who burned a billion tokens producing nothing.

These companies are walking into a death spiral and don’t see it.

The funniest part? Measuring actual output is easier than ever. You have AI. Use it.

In 18 months the same execs will announce “AI didn’t deliver ROI” and pull the budget. AI will have worked fine. They just measured the wrong fucking thing and torched millions rewarding theater over output.

Every company should be pushing AI hard. But this is how you guarantee it fails.
ylecun
ylecun @ylecun
Retweeted
Steve Stewart-Williams Steve Stewart-Williams
Smarter People Are Less Violent
"The prevalence of violent behavior dropped steadily with increasing IQ: 16.3% of individuals with IQs in the 70-79 range reported violent behavior, compared with just 2.9% of those with IQs of 120-129."
https://www.stevestewartwilliams.com/p/smarter-people-are-less-violent
karpathy
karpathy @karpathy
Retweeted
Zain Shah Zain Shah
Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see.
@eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)
garrytan
garrytan @garrytan
Retweeted
Todd Hanford Todd Hanford
I love the idea of "thin harness, fat skills". AI engineers are starting to coalesce around the fact that better outputs and longer running agents are enabled by:
- Skills which use deterministic tools
- Robust regression testing using evals, unit tests, E2E tests, and smoke tests
The new DRY principle is to turn single failures into skills instead of correcting your agent every time it makes the same mistake.
Garry Tan: This cycle below is what has replaced 50% of my agentic coding. This is now how I am building GBrain and my own personal mini-AGI with full context on me and the things I care about.
It's not hard. It's quite fun. I do something, anything with OpenClaw, then I say SKILLIFY IT
ylecun
ylecun @ylecun
Retweeted
Caroline Orr Bueno, Ph.D Caroline Orr Bueno, Ph.D
“In the end the Party would announce that two and two made five, and you would have to believe it.” -George Orwell, 1984.
Aaron Rupar: RFK Jr: "President Trump has a different way of calculating percentages. If you have a $600 drug and you reduce it to $10, that's a 600% reduction."
sama
sama @sama
Retweeted
Tibo Tibo
Team is hard at work together with @steipete to make OpenAI models and ecosystem be the obvious way to to enjoy your claw. A lot more to come next week, but a reminder that you can use OpenClaw as part of your ChatGPT subscription today already.
(also still having too much fun with ChatGPT Images 2.0 today)
pash: I've embarked on a new sprint. My mission is to make OpenAI models feel magical in OpenClaw in the next few weeks.
Diving in today, I noticed a bug. When you configured OpenClaw to use the Codex harness with OpenAI models, auth was broken, and the system was silently falling
amasad
amasad @amasad
Retweeted
Bruno Werneck de Almeida Bruno Werneck de Almeida
1/ Excited to announce @Plaid's partnership with @Replit. Now anyone can easily build their own custom financial applications and get started in less than five minutes.
Get started here: https://replit.com/partners/plaid
Read more on the partnership here: https://plaid.com/blog/build-personalized-finance-apps-replit-plaid/
Replit ⠕: Build your own finance app with Replit using @plaid
Plaid is now natively integrated into Replit, giving you secure, real-time access to your financial data.
Connect your accounts → prompt what you want → get a working app.
Spending dashboards, AI financial assistants,
amasad
amasad @amasad
Build finance apps with Plaid + Replit!

Replit ⠕: Build your own finance app with Replit using @plaid

Plaid is now natively integrated into Replit, giving you secure, real-time access to your financial data.

Connect your accounts → prompt what you want → get a working app.
Spending dashboards, AI financial assistants,

garrytan
garrytan @garrytan
Retweeted
Perplexity Perplexity
We've published new research on how we post-train models for accurate search-augmented answers.
Our SFT + RL pipeline improves search, citation quality, instruction following, and efficiency.
With Qwen models, we match or beat GPT models on factuality at a lower cost.
sama
sama @sama
These are cool! I think most companies will want to use them.

OpenAI: Introducing workspace agents in ChatGPT—shared agents that can handle complex tasks and long-running workflows across tools and teams.

garrytan
garrytan @garrytan
Retweeted
aacash.eth - Aakash Kumar aacash.eth - Aakash Kumar
Skills and agent definitions are a great primitive to start with for context engineering. Love that @garrytan’s body of work is bringing more folks in to the fold of NLAH (natural language agentic harness).
In parallel, for higher level autonomy and more elaborate tasks (temporal nature + provenance + deeper reasoning interlaced with workflows), engineers at many cos are pushing boundaries on custom orchestration and ‘harnessing’ .
Golden days of the agentic era are just starting! LFG 🚀
Garry Tan: http://x.com/i/article/2046866228703363072
gdb
gdb @gdb
Build workspace agents for your team, on top of a cloud-hosted Codex harness. Hook them up to tools, give them recurring tasks, and talk to them from surfaces like Slack.

Easier than ever to bring the power of agents to your computer work.

OpenAI: Introducing workspace agents in ChatGPT—shared agents that can handle complex tasks and long-running workflows across tools and teams.

ylecun
ylecun @ylecun
Retweeted
Republicans against Trump Republicans against Trump
The American people want Donald Trump, the most corrupt president in history, impeached and removed from office.
According to a new poll published this week, 55% of Americans support impeachment, while just 37% oppose it. Notably, 1 in 5 of Trump’s own voters also support impeachment.
R A W S A L E R T S: 🚨 BREAKING: Democrats now projected to impeach Trump — 66% chance.
amasad
amasad @amasad
Retweeted
Replit ⠕ Replit ⠕
Introducing Race to Revenue.
Follow real founders around the world for a once-in-a-lifetime opportunity to build and launch products live on camera. But whose app will prove itself with cold, hard revenue?
Out now. Let's race. ⠕
swyx
swyx @swyx
Retweeted
Mikhail Parakhin Mikhail Parakhin
Had a great conversation with @swyx on @latentspacepod about what we're building at @Shopify. SimGym, Tangent, our approach to PR review at 30% month-on-month merge growth and why larger models are cheaper in long run.
Swyx asks good questions!
https://www.youtube.com/watch?v=RrkGoX3Cw7o&list=PLWEAb1SXhjlfkEF_PxzYHonU_v5LPMI8L&index=1
ylecun
ylecun @ylecun
Retweeted
Junfan Zhu 朱俊帆 Junfan Zhu 朱俊帆
🦤 LeWorldModel: Learning Physics from Pixels — Stable World Models with Just Two Losses
World models:
1️⃣ DINO-WM: pretrained ViT encoder (from ImageNet) → features → predictor. But encoder is frozen, so no end-to-end learning. Its “visual genetics” are tuned for coarse classification (cats vs dogs), not physics: hard to resolve mm-level changes (e.g., 2 mm block motion). A powerful predictor on top of a “myopic” encoder = blind physical reasoning.
2️⃣ PLDM: end-to-end, but unstable and collapse-prone. Rely on reward as prediction target, so it only works in environments with explicit rewards (e.g., games).
3️⃣ JEPA (Joint Embedding Predictive Architecture): predict next latent instead of pixels. Two hard problems:
collapse (encoder → constant vector, e.g., all zeros)
achieving pixel-level + end-to-end + stable jointly
💡 LeWM solves:
👉 JEPA that trains stably end-to-end from raw pixels
👉 Single hyperparameter λ:
next-embedding prediction
SIGReg (Gaussian regularization)
🧠 #1: true end-to-end
No frozen encoder. Perception + dynamics co-evolve → representation aligned with fine-grained physics, not ImageNet bias.
🧠 #2: “only” one hyperparameter
PLDM needs ~6. LeWM needs 1 (λ) → weight of SIGReg. Plug-and-play, stable.
⚠️ Collapse problem
Encoder could map all inputs → same vector → trivial prediction → zero loss → useless model.
🧩 SIGReg (Gaussian Integral Signature Regularization)
Core: prevent collapse via distribution constraints.
Sample 1024 random directions
Project embeddings → 1024 1D “shadows”
Each must pass Epps–Pulley test (≈ standard normal)
Loss pushes test statistic → 0
Any failed projection ⇒ penalty
Why it works:
Cramér–Wold theorem → a high-dim distribution is determined by its 1D projections.
👉 Enforcing Gaussianity across 1D projections precludes degenerate collapse under projection constraints
🧪 Physical probing
Train in PushT (push block to target), then:
Linear probe recovers: block position, angle, end-effector
👉 physics is linearly decodable
🚨 Teleport block (physically impossible):
embedding anomaly spikes sharply
👉 model internalizes constraint: objects cannot teleport
👉 not inferred from pixel surface features, but encoded as latent constraints
📈 Temporal straightness
No smoothness loss, yet trajectories in latent space are ~straight lines
👉 no prior, purely from “predict next embedding”
👉 implies physically consistent motion, not blurry interpolation
⚡ Performance
Planning: 0.98s vs 47s (DINO-WM)
Success: 96% vs 78% (PLDM)
Why faster?
DINO-WM: frozen encoder → info loss → extra online passes
LeWM: end-to-end → representation already task-aligned
👉 0.98s = fast to handle dynamic obstacles & real-time control
⚠️ Limitations
~15M params (“ant-scale”) → fails on OGBench-Cube (complex physics)
not yet tested on real robots
🔥LeWM shows:
👉 JEPA + SIGReg = stable world models
👉 raw pixels → physics-aware latent space
👉 minimal design (2 losses, 1 hyperparameter)
Next step: scale + real-world deployment 🤖
Junfan Zhu 朱俊帆: http://x.com/i/article/2047025326879072256
petergyang
petergyang @petergyang
Retweeted
Karri Saarinen Karri Saarinen
I might be biased but 90% my work AI use has moved to
@linear recently:
- pull daily report what should I pay attention to
- recent user frustrations or trend needs
- check launch dates on projects
- make fixes on the product with the coding agent
- reflect my specific thoughts against product memo
- writing investor updates based on our progress
- ask about specific features to debug user issues
- pull specific follow ups from meeting transcripts
- write project update based on the meeting we had
- prep for customer call based on the brief I got, and the - plans we have
- research new features based on customer requests
- research revenue opportunity based on some of the
- features and customers we have
- find latest trends on bugs
- write blog post about a feature on my phone
- set up project, docs, milestones and issues from feature research
- create issues to project from our “roast” feedback meeting
I could do lot of these in other tools as well, but I like that I can work in the context and at-mention specific documents, issues, teams, projects, or files when I'm chatting.
Then actually start making plans or work to make changes or assign people on things.
I also have set up the same writing guidance and skills that I have in other tools but somehow feel Linear understands me better.
I feel like not working in some void but some structure around me which I can flip between the agent and the structure, and it's all about work & Linear, not about my personal questions or topics.
garrytan
garrytan @garrytan
Retweeted
Tony Dang Tony Dang
A dream come true for every human anxious about their agents leaking secrets.
Agent Vault aspires to be the portable solution that you can bring anywhere: open-prem, cloud, any container environment.
Front your agent with Agent Vault and let it rip.
https://github.com/Infisical/agent-vault
Infisical: Any secret an agent can read is a secret an attacker can steal.
So we built the fix: Agent Vault, an HTTP credential proxy and vault for AI agents.
Secret managers were built for deterministic services. They return credentials to the caller and trust them to behave.
AI agents
swyx
swyx @swyx
Retweeted
Latent.Space Latent.Space
🆕 Shopify's AI-Native Engineering: 100% adoption, unlimited tokens, Tangle, Tangent, & SimGym
https://www.latent.space/p/shopify
@Shopify CTO @MParakhin explains why near-universal AI adoption is changing how Shopify builds, how SimGym uses Shopify’s scale and historical data to simulate customers, how Tangle and Tangent are changing experimentation inside the company, why the real bottleneck in AI coding is now PR review and CI/CD, and why unlimited tokens and auto-research loops are unlocking gains across every domain.
amasad
amasad @amasad
Sometime apps you made that are secure might suddenly become vulnerable when there is an exploit in one of its dependencies.

Typically you need engineers on payroll to monitor and handle this.

We just automated that with Auto-Protect. It’s like your security engineer 24x7.

Replit ⠕: Keeping your apps secure has always required constant oversight from you.

Replit Auto-Protect now keeps watch over your apps 24x7.

We'll monitor threats, proactively prepare fixes and notify you to apply those fixes, even when you are away.

garrytan
garrytan @garrytan
Had no idea but there is a lot of simultaneous discovery in agentic engineering these days

Turns out the ideas about SKILLIFY from my post at 2am last night is similar to a Claude Code internal!

JC: this is interesting, was just reading the "claude code" leaked source code yesterday and they have a bundled skill called "skillify" there.
Mostly the same purpose, which i believe is a trend we're going to start seeing pop-up more:
Self-Improving Agents

garrytan
garrytan @garrytan
Thin Harness Fat Skills Fat Code = THE NEW DRY

Todd Hanford: I love the idea of "thin harness, fat skills". AI engineers are starting to coalesce around the fact that better outputs and longer running agents are enabled by:
- Skills which use deterministic tools
- Robust regression testing using evals, unit tests, E2E tests, and smoke
garrytan
garrytan @garrytan
OK the aspiration is GBrain installs and gives you all this stuff

Right now it does still require you to say a lot of "Help my openclaw take advantage of all the things in the GBrain repo" versus something that is like GStack where AskUserQuestion just keeps you in flow

JZ: Been following Gary's work for a bit

Finally bit the bullet and put gbrain in next to Openclaw yesterday

My thoughts:

It's less a 'install this plugin and everything works' and more of a customizable framework

Maybe that's not the best way to describe it but it was super
garrytan
garrytan @garrytan
This diagram is kind of cool except it hallucinated my face

Delia Dou: GPT出的图颜值很高诶

amasad
amasad @amasad
Many startups launched on Replit are now $100m+ — we now have a startup program with $25k free credits.

Replit ⠕: Replit for Startups is live. Up to $25K in credits to build your product, ship fast, and scale.

If you're building on Replit, this is for you. 🚀 http://replit.com/startups

claudeai
claudeai @claudeai
Interactive charts and diagrams are now in Claude Cowork.

Available in beta on all paid plans.

Claude: Claude can now build interactive charts and diagrams, directly in the chat.

Available today in beta on all plans, including free.

Try it out: http://claude.ai

amasad
amasad @amasad
Retweeted
Francisco Cruz Mendoza Francisco Cruz Mendoza
Excited that @Replit is once again teaming up with @stripe for another awesome event in the Bay Area next week!
If you are in town for Stripe Sessions or just want to meet other builders, feel free to join us!
https://stripe.events/stripestartupsarcadenight
swyx
swyx @swyx
Retweeted
Walden Walden
http://x.com/i/article/2046690715657478145
amasad
amasad @amasad
If liked our Agent 3 documentary, this is next level, and focused on Replit Builders. It will be a series.

Replit ⠕: Introducing Race to Revenue.

Follow real founders around the world for a once-in-a-lifetime opportunity to build and launch products live on camera. But whose app will prove itself with cold, hard revenue?

Out now. Let's race. ⠕

garrytan
garrytan @garrytan
Retweeted
Abhishek Ray Abhishek Ray
introducing opslane
test your claude code changes in a real browser
Inspired by @garrytan's GStack /qa skill.
- reads the specs to understand the feature
- builds acceptance criteria from them
- runs tests in a real browser against your local dev server
- full report with screenshots
Static code review tools only review the code. Opslane runs it.
Fully open source.
gdb
gdb @gdb
ChatGPT plugin now available for Google Sheets:

Ryan Brewer: Excited to announce ChatGPT for Google Sheets! This was a really fun one to work on. Create new sheets, ask questions across tabs and formulas, and make updates directly in your sheets https://chatgpt.com/apps/spreadsheets/

garrytan
garrytan @garrytan
Retweeted
YIMBYLAND YIMBYLAND
This is not normal discourse.
We don’t have to accept this as an acceptable line of thinking. In fact, you should reject it with every fiber in your being.
These people hate normal Americans and the social norms that make our lives enjoyable and safe.
swyx
swyx @swyx
Team @Shopify brought some fire to this one; add this to the growing list of “WTF happened in Dec 2025” charts

(this plots token usage across all the technical staff of shopify - the whole time they had unlimited token budget, but something cracked recently and the slope is both changing and percentile deltas are widening a concerning amount!!)


Mikhail Parakhin: Had a great conversation with @swyx on @latentspacepod about what we're building at @Shopify. SimGym, Tangent, our approach to PR review at 30% month-on-month merge growth and why larger models are cheaper in long run.
Swyx asks good questions!
https://www.youtube.com/watch?v=RrkGoX3Cw7o&list=PLWEAb1SXhjlfkEF_PxzYHonU_v5LPMI8L&index=1
gdb
gdb @gdb
had a great conversation with @shaneparrish, full podcast below

Shane Parrish: My conversation with @OpenAI co-founder @gdb

This is the most detailed first-person account of the 72 hours after Sam Altman was fired.

We also go deep on what comes next: the global race to AGI, why ChatGPT stopped showing reasoning, how much of OpenAI's own code is now

sama
sama @sama
Retweeted
Karan Singhal Karan Singhal
Today we’re introducing two big steps for health at OpenAI:
- ChatGPT for Clinicians, a free version of ChatGPT designed for clinical work
- HealthBench Professional, a new benchmark to evaluate real clinician chat tasks
We’re excited about what this can unlock for care. ❤️
garrytan
garrytan @garrytan
This is 100% right. If you know your market, your customer, and what problem you're solving, you have taste, you know what is good and what sucks

You can now fly

Gideon Shalwick: Hot take:

Vibe coding doesn’t (fully) replace thinking.

It replaces hard core, coalface coding.

If you want it to actually work, you still need:

- Deep understanding of what the market wants
- A clear user experience (not just “it works”)
- Real UI design with proper
garrytan
garrytan @garrytan
Retweeted
darkzodchi darkzodchi
This 47-min interview with Boris Cherny (the creator of Claude Code) will teach you more about AI-native development than 6 months of trial and error.
Watch it, bookmark it, share it.
Your entire approach to building with Claude will shift.
bodila: http://x.com/i/article/2034716088756219904

YouTube

0

No recent videos fetched on this date.