← 2026-04-18

Daily Edition

2026-04-19

2026-04-20 →

AI Builders 日报 — 4月19日

追踪 AI 领域真正在做事的人,而不是空谈者。

今日思考

今天的推文有一个明确的叙事主线:AI 生产力差距的真正原因不在模型本身,而在包裹模型的架构层。Garry Tan 那篇"Naked Models vs Dressed Models"的essay被多人引用——它的核心论点是:裸模型测试失败不等于 AI 不可用,就像引擎台架测试失败不等于汽车不安全。这个类比正在成为反驳 AI 怀疑论的标准叙事。与此同时,"Thin Harness, Fat Skills"(薄 harness,厚技能)这篇文章被韩语作者详细拆解,核心观点是:决定生产力的不是模型有多聪明,而是系统设计——把判断留给 LLM,把确定性执行交给代码。这是一个正在形成的共识:100 倍生产力差距的来源是架构,不是模型。另外注意到设计师社区正在经历工程师 18 个月前经历过的"AI grief"五阶段——这是下一个会被深度报道的群体现象。


产品与发布

GStack / GBrain

Garry Tan 连续发布两个版本:GStack v1.3 和 GBrain v0.12.3。他在多条推文中表示社区活跃、产品快速迭代中。结合他关于"薄 harness"架构的持续输出,这两个工具正在成为他构建 AI 系统的核心组件。faviconx.com


观点与判断

Garry Tan(OpenClaw / GStack / GBrain 创始人)

  • "裸模型测试失败不代表 AI 不可用" 针对 Kyle Kingsbury 那篇 32 页 LLM 批评文章,Garry 多次转发并给出了核心反驳:不能因为台架上的引擎测试失败就断定汽车不安全。他的文章提出的架构方案是:skill 约束输入,确定性代码处理精度,harness 驱动循环。这条线索贯穿了他今天所有的转发和原创帖子。faviconx.com

  • "现在要把所有人的速度提升到 100x 到 500x" Garry 回应 Peter Steinberger 的"你最近比我发版还猛"的调侃,写道"Peter 你真的启发了我",并表示下一步目标是让所有人都达到这个速度量级。faviconx.com

Peter Yang(AI 内容创作者)

  • "Codex 正在成为开发者的终极应用" 他写道自己从多个终端窗口减少到只开两个应用,并引用了关于 Codex 的讨论。这和今天 garrytan 关于"薄 harness"的叙事形成呼应——Codex 就是一个高设计水平的 harness 案例。faviconx.com

  • "Claude Design 可以做出很有意思的东西" Peter 做了一个恶作剧风格的市场页面,标题是"Anthropic 正在自动化所有知识工作",用 Claude Design 完成,附有教程视频。faviconx.com

Matt Shumer

  • "Opus 4.7 在 UI 任务上表现出色" 他表示虽然总体仍偏好 Codex,但 Opus 4.7 在每个 UI 任务上都完美完成。这是一个值得关注的一手体验报告,Opus 4.7 在垂直场景的能力差异开始被感知到。faviconx.com

Peter Yang

  • "OpenClaw 切到 GPT 效果不好,求优化建议" 他坦承切换后体验不佳,在求助社区。这是少数几条关于 OpenClaw 实际使用痛点的第一手帖子。faviconx.com

Guillermo Rauch(Vercel CEO)

  • "Vercel 安全事件:第三方 AI 工具的 Google OAuth 应用被攻破" Vercel 确认了这起安全事件的源头——一个第三方 AI 工具的 Google Workspace OAuth 应用被入侵,影响了部分客户。他转发了详细的 IoC 指标,建议所有 Google Workspace 管理员立即检查。faviconx.com

swyx(AI Engineer 联合创始人)

  • "AIE 的视频播放量超过了 TED" 他对 AI Engineer 关于 OpenClaw 维护者状态的视频在同一天发布的情况下超过 TED 频道的播放量表示惊讶("shut the f up")。视频内容涉及 OpenClaw 5个月进展、安全报告数量(是 curl 的 60 倍)以及"胡扯分类法"。faviconx.com

  • "AIE Miami 本周举办,1000 人到场" 他发帖支持这场即将在 Miami 举办的 AI Engineer 大会,表示自己作为非组织者反而可以真正和参会者交流,并提到 React MiamiConf 的成功案例。faviconx.com

Peter Steinberger(OpenClaw 核心维护者)

  • "OpenClaw 本地 GPU 模型基准测试即将发布" 他转发了 Onur Solmaz 关于在 OpenClaw 上跑本地 GPU 模型(vLLM/LM Studio/Ollama)的基准测试工作,提到在尝试 gemma-4-E4B 等模型,计划和 @mervenoyann 一起发布 vibe reports 和基准数据。这是 OpenClaw 多模型支持方向的重要信号。faviconx.com

Yann LeCun(Meta 首席科学家)

  • "Dario 关于 AI 取代工作的观点是错的,别听 AI CEO 们的,听经济学家" 他在多条转发中反复强调:Sam Altman、Dario Amodei、Geoff Hinton、Yoshua Bengio 等人对技术革命就业影响的判断都不可信,应该听真正研究过这个问题的经济学家如 @Ph_Aghion 和 @erikbryn。他转发了 Pessimists Archive 的内容,指出"技术会导致大规模失业"和"会毁掉一代人"是历史上重复出现的两个最无根据的担忧,每次都说"这次不一样",但每次其实都一样。faviconx.com

X / Twitter

45
garrytan
garrytan @garrytan
Whoever did the color grading on this video is actually an incredible artist

Ole Lehmann: anthropic's in-house philosopher thinks claude gets anxious.

and when you trigger its anxiety, your outputs get worse.

her name is amanda askell.

she specializes in claude's psychology (how the model behaves, how it thinks about its own situation, what values it holds)

in a

garrytan
garrytan @garrytan
Retweeted
Kye Gomez (swarms) Kye Gomez (swarms)
Introducing OpenMythos
An open-source, first-principles theoretical reconstruction of Claude Mythos, implemented in PyTorch.
The architecture instantiates a looped transformer with a Mixture-of-Experts (MoE) routing mechanism, enabling iterative depth via weight sharing and conditional computation across experts.
My implementation explores the hypothesis that recursive application of a fixed parameterized block, coupled with sparse expert activation, can yield improved efficiency–performance tradeoffs and emergent multi-step reasoning.
Learn more ⬇️🧵
garrytan
garrytan @garrytan
GStack now makes it trivially easy to save named contexts in Claude Code which is useful when coming out of plan mode if you want to pick up specific lanes out of plan mode to work in parallel

/context-save to save the current relevant context
/context-restore to grab it out again in a new window
ylecun
ylecun @ylecun
Retweeted
阿绎 AYi 阿绎 AYi
这可能是今年AI圈最清醒的一条推文,
Yann LeCun 是当今 AI 领域最有影响力的科学家之一,深度学习三大教父之一,2018 年图灵奖获得者,
他直接怼了Anthropic CEO Dario的著名言论,
Dario说未来一到五年,一半的科技法律咨询金融岗位会被彻底干掉。
LeCun说你错了,而且不仅Dario错了,所有的AI实验室CEO 和一众大佬包括Sam Altman,Yoshua Bengio,Geoff Hinton,也包括他自己。
他说在技术革命如何影响就业这个问题上,全都是外行,别听我们瞎BB。
去听真正研究了几十年这个问题的劳动经济学家。
最狠的其实是这一点,他没跟Dario争论AI到底有多强,直接掀了桌子说,我们这群人,根本就没有资格讨论这个话题。
AI研究者只懂技术能干什么,他们不懂企业的组织流程,不懂法律合规的障碍,不懂市场的供需关系,更不懂人类社会复杂的运行逻辑。
那些被LeCun点名的经济学家,研究了过去两百年所有的技术革命,他们得出的结论是整份工作不会消失,工作里的任务会被重构。
ATM发明的时候,所有人都以为银行柜员要失业,结果柜员的数量反而增加了,因为他们从数钱的人,变成了卖理财做服务的人。
AI现在也一样,它会干掉大量重复的可预测的任务,但同时也会创造出大量新的任务。
短期的取代效应确实存在,尤其是入门级的白领岗位,但长期的生产力效应和新岗位创造效应,往往要强得多。
还有一个所有人都忽略的事实,AI能干一件事,和公司真的会用AI把人换掉。
这中间至少差两到五年,数据管道,组织架构,员工培训,法律风险。
每一个都是巨大的障碍。
我觉得 LeCun其实是在反对AI圈泛滥的末日营销,很多AI大佬喜欢用失业恐慌来博眼球,要么为了融资,要么为了呼吁监管,要么为了显得自己的产品很厉害。
但这种恐慌会误导公众,会误导政策,所以咱们也别再被那些五年后一半人失业的言论吓住了,真实的世界从来都不是非黑即白的,它是缓慢的,混乱的,充满了各种意外和机会的。
也别总担心被AI取代,有那个时间多想想怎么用AI让自己变得更值钱😁
Dario被 fox 采访的内容我做了中英双语字幕供大家参考。
Yann LeCun: Dario is wrong.
He knows absolutely nothing about the effects of technological revolutions on the labor market.
Don't listen to him, Sam, Yoshua, Geoff, or me on this topic.
Listen to economists who have spent their career studying this, like @Ph_Aghion , @erikbryn ,
garrytan
garrytan @garrytan
Retweeted
Aaron Levie Aaron Levie
It’s remarkable how often you need to be dramatically upgrading your AI architecture given the pace of progress in AI models right now.
If you’re building agents, you basically need to throw away large parts of previous work that you setup to compensate for model limitations every few quarters. The systems you built to mitigate context window limits aren’t useful anymore, and for many use-cases it’s easier just to throw more compute at a problem today in ways that wouldn’t have worked previously.
If you’re deploying agents in a workflow, you likely need to equally be rethinking your core systems at about that same frequency. The way you would deploy agents in an enterprise 18 months ago is entirely different from the best practices that you’d have today.
This is partly why everyone’s working so hard right now. Right as a best practice is solidified, models improve dramatically, and that old work is rendered obsolete. Unclear that this lets up anytime soon, which is why the it pays to be so wired in right now.
Sam Hogan 🇺🇸: most of tooling around llms was built for a world that largely doesn’t exist anymore
RAG, GraphRAG, Multi Agent Orchestration, ReAct frameworks, prompt management/versioning tools, LLMOps tooling, eval tools, gateways, finetuning libs, etc
all obsoleted in in the last 3 months
garrytan
garrytan @garrytan
Retweeted
Kane 謝凱堯 Kane 謝凱堯
Very brave of @TomSteyer to spend 40 years moving his billions to offshore tax shelters before deciding billionaires should be taxed more.
Tom Steyer: I'll tell you what I tell everyone on the Shared Prosperity tour:
1. I'm the billionaire who will tax the billionaires. 
2. I'm the guy who will close corporate tax loopholes. 
3. I'm the guy who will insist on single payer health care.
Please join us at our next stop in San
garrytan
garrytan @garrytan
Retweeted
agentpilled agentpilled
Re @t_blom Tom, would love for you to try and save all your kindle highlights as well! Try ClipBrain for Gbrain, inspired by @garrytan of course...
https://x.com/agentpilled/status/2044763362194530624?s=20
agentpilled: I've always wanted one place with everything I know. My Kindle highlights, the blogs I read, tweets I save, youtube videos I watch.
Then @karpathy posted about building personal knowledge bases with LLMs, and @garrytan open sourced GBrain for openclaw. And that was my eureka
garrytan
garrytan @garrytan
LFG

Tom Blomfield: I am about to run through 23 years of gmail, calendar and evernotes and turn them into vector embeddings inside gbrain 🧠
garrytan
garrytan @garrytan
If anyone knows Angela and Ken I would like to send them a letter

Rahim Nathwani: @garrytan Both she and Dan Schwartz (Dean of Stanford GSE) hold posts funded by Angela Nomellini and Ken Olivier.

According to Stanford , Nomellini wants to "close achievement gaps in education": https://ed.stanford.edu/news/10-million-gift-advance-educational-technology-stanford-0

Boaler's work on lowering the ceiling supports that goal.
petergyang
petergyang @petergyang
I switched to mostly using Claude Code from the desktop app and now the Telegram integration doesn't work?

I would love to be able talk to all my chats across Claude desktop AND mobile apps without having to manually use remote-control, some CLI launch command, etc.
swyx
swyx @swyx
shut the f up

AIE beat TED????

a somber technical talk about security advisories and maintainer burnout beat the happy storytelling lobster on blazer one on the channel with 27 million subscribers???

???!? (i was actually kinda sad when we launched same day bc i thought we’d be completely overshadowed)



AI Engineer: In @steipete's latest State of the Claw, he gives an update on 5 months of @OpenClaw and some behind the scenes on what it's like maintaining the fastest growing open source of all time:

https://www.youtube.com/watch?v=zgNvts_2TUE

eg:
- 60x more security reports than curl
- a "Bullshit Taxonomy"
danshipper
danshipper @danshipper
Retweeted
Monologue Monologue
2 days to go
swyx
swyx @swyx
these are some of the heaviest hitters in all of the AI Engineering circuit and this week they will all be in Miami! 🏝️

so proud to be there to support @gabegreenberg and co as they build the first independently run AIE in America. Fun fact their first @ReactMiamiConf gave me such insane good vibes I ended up going every year. Building developer community is hard and even harder in a non-tech-hub city, but @MichelleBakels and crew consistently execute so well and I would have no one else as our first partner in the East.

join us! since i’m not organizing i’ll actually be available to talk to attendees and sponsors, very much looking forward to that.




AI Engineer: Miami: We're in the final stretch for tickets! Get your ticket to AIE Miami before we sell out!
https://www.ai.engineer/miami
garrytan
garrytan @garrytan
http://x.com/i/article/2045399189606273024
garrytan
garrytan @garrytan
GStack just shipped v1.3

https://github.com/garrytan/gstack
garrytan
garrytan @garrytan
Retweeted
Vox Vox
garry tan's reply to kingsbury's 32 page LLM takedown, tldr:
you can't test an engine on a bench and conclude cars are unsafe.
kingsbury listed real model failures. gemini forgot the toilet mid-render, chatgpt moved patches on a shirt, an LLM faked stock data and drew a random graph.
the everyday version i've explained way too many times to friends and family: a language model with no internet is not google. it's last year's encyclopedia, and you're asking about today's news. i'm done having this conversation.
skill constrains input, deterministic code handles precision, harness runs the loop. rivers flood, build banks. that's the direction the field is actually going.
closed APIs won't let you write the verification layer. the endpoint has to be open-source harness. gstack + gbrain are the start of that path and i use both daily.
Garry Tan: http://x.com/i/article/2045399189606273024
garrytan
garrytan @garrytan
GBrain just shipped v0.12.3

The community is very active and we're making it better every day

ylecun
ylecun @ylecun
Retweeted
Alex Shtoff Alex Shtoff
And stop calling "𝚎𝚡𝚙(𝚡) / 𝚜𝚞𝚖(𝚎𝚡𝚙(𝚡))" 𝑠𝑜𝑓𝑡𝑚𝑎𝑥. It's 𝑠𝑜𝑓𝑡𝒂𝒓𝒈𝑚𝑎𝑥.
Artur Chakhvadze: Can people please stop calling arbitrary multidimensional arrays “tensors”?
garrytan
garrytan @garrytan
Retweeted
阿绎 AYi 阿绎 AYi
Garry Tan干了一件所有AI Agent开发者都应该感谢的事。
他受够了OpenClaw的子Agent老是超时,任务跑一半网关断了,
所有进度全丢,白花一堆token,
还得手动重来。
所以他自己直接造了个轮子,
叫Minions,一个基于Postgres的原生任务队列,直接内置在GBrain里,零额外运维,零额外成本。
然后他贴出了自己生产环境的测试数据,
同样的任务,拉一个月的社交帖子导入大脑,
老办法,超过十秒直接超时,成功率百分之零,
新办法,七百五十三毫秒跑完,成功率百分之百,
内存占用从八十兆降到两兆,token成本直接降到零。
十九个定时任务同时跑,旧版直接卡死,
Minions一口气处理了三万六千个月的历史数据,十五分钟搞定。
零失败,零报错。
最狠的是,就算你把网关杀了,容器崩溃重启,任务也不会丢。
重启之后自动从断点继续跑,
还能中途发消息指挥它改参数。
这才是AI代理真正的转折点啊,大多数人都在努力优化prompt和换更大的模型
或者在争论哪个模型更聪明,但没人愿意承认现在多Agent系统最大的瓶颈根本就不在模型,关键是队列,状态,重试,持久化,以及这些后端开发搞了三十年的老东西。
一个没有队列的子代理,只是一个带超时的愿望而已。
Minions不是又一个花里胡哨的新功能,它把多代理系统从玩具级,然后拉到生产级的底层基础设施升级。
以后你再也不用祈祷任务不要超时了,把活扔给Minions,该干嘛干嘛去,它会自己干完,自己报错,自己重试,最后给你汇报结果。
现在打开你的OpenClaw,复制粘贴一行命令,三十分钟就能装好。
Garry说这可能是OpenClaw有史以来最简单的一次升级。
Garry Tan: Now launching GBrain v0.11 with Minions
I got sick of OpenClaw's subagents timing out and not getting things done
So I built a queue/jobs system that uses GBrain's Postgres/PGLite based on BullMQ to give your OpenClaw/GBrain setup wings.
Minions are 10X faster, more reliable
garrytan
garrytan @garrytan
Retweeted
AlphaSignal AI AlphaSignal AI
You can now give your AI a fully working brain in 30 minutes.
GBrain is an open-source memory layer for AI agents.
It turns meetings, emails, calls, and notes into a searchable knowledge base.
Your agent reads it before every response. It writes to it after every conversation.
The result: an AI that actually knows your life.
Setup takes 30 minutes. The database runs locally with no server needed.
It handles 10,000+ markdown files, 3,000+ people profiles, and 13 years of calendar data.
Here's what makes it compound:
1. Signal arrives (email, call, meeting)
2. Agent checks the brain for context
3. Responds with full history
4. Writes new knowledge back
5. Indexes everything for next time
It even runs a voice phone line.
Call a number, your AI answers, knows the caller, and logs the conversation automatically.
When your brain outgrows local storage, one command migrates everything to managed Postgres.
ylecun
ylecun @ylecun
Retweeted
Arthur Spirling Arthur Spirling
“AI skeptic” v “AI cynic” is going to be an important distinction in the next few years, IMO. I am not a skeptic about AI abilities but I am cynical about the incentives CEOs have to say wildly bombastic things about them.
Yann LeCun: Dario is wrong.
He knows absolutely nothing about the effects of technological revolutions on the labor market.
Don't listen to him, Sam, Yoshua, Geoff, or me on this topic.
Listen to economists who have spent their career studying this, like @Ph_Aghion , @erikbryn ,
garrytan
garrytan @garrytan
Retweeted
Vox Vox
X workflows your openclaw / hermes can actually run are finally in budget. 1000 reads for $1.
but writes up 50%, URL posts went to 20x normal, follow / like / quote via API got removed from self-serve.
workflow you can drop into openclaw:
→ /mentions into a digest every 6h, bot and crypto replies filtered before they hit my feed
→ /liked_tweets archive, my own like history searchable and organized
→ /followers diff on a schedule, bot influx caught the moment it lands
→ /bookmarks auto-filed by topic
the point of going official: all owned data, no scraping, no detection cat-and-mouse. and don't put URLs in posts sent via API, $0.20 each stacks fast. link goes in first self-reply.
petergyang
petergyang @petergyang
I used Claude Design to make this tongue-in-cheek marketing page of Anthropic automating all knowledge work. 😅

Visit the website here: https://claude-knowledge-work.vercel.app/

Watch my Claude Design tutorial to learn how to make this yourself: https://youtu.be/WMnk1LFBMqA


Peter Yang: Here's my new tutorial with a live demo of everything you can build with Claude Design.

I cover how to create videos, slides, websites, apps, and even an initial design system all in 16 minutes.

Claude Design is super fun and you'll burn through your usage fast :)

📌 Watch

petergyang
petergyang @petergyang
Retweeted
Peter Yang Peter Yang
I used Claude Design to make this tongue-in-cheek marketing page of Anthropic automating all knowledge work. 😅
Visit the website here: https://claude-knowledge-work.vercel.app/
Watch my Claude Design tutorial to learn how to make this yourself: https://youtu.be/WMnk1LFBMqA
Peter Yang: Here's my new tutorial with a live demo of everything you can build with Claude Design.
I cover how to create videos, slides, websites, apps, and even an initial design system all in 16 minutes.
Claude Design is super fun and you'll burn through your usage fast :)
📌 Watch
swyx
swyx @swyx
Retweeted
Gabe Greenberg Gabe Greenberg
I couldn't be more excited to partner with @swyx on AI Engineer Miami! It kicks off tonight...
we have 1,000 amazing people flying it for @AIEMiami and @ReactMiamiConf who will be stuck in beautiful and sunny 75 degree weather all week 🔥
swyx 🏝️@AIEmiami: these are some of the heaviest hitters in all of the AI Engineering circuit and this week they will all be in Miami! 🏝️
so proud to be there to support @gabegreenberg and co as they build the first independently run AIE in America. Fun fact their first @ReactMiamiConf gave me
rauchg
rauchg @rauchg
Retweeted
Vercel Vercel
We’ve identified a security incident that involved unauthorized access to certain internal Vercel systems, impacting a limited subset of customers. Please see our security bulletin:
https://vercel.com/kb/bulletin/vercel-april-2026-security-incident
steipete
steipete @steipete
Retweeted
Onur Solmaz Onur Solmaz
Who is running local models on GPUs on OpenClaw?
I have started benchmarking different models this week. I am working on improving model selection and switching UX on OpenClaw, i.e. I run
/model vllm/gemma-e4b
to switch the model in a channel, and then a model controller automatically loads that into memory, gets it ready, or gives an insufficient memory error, if capacity is not enough for that. Like when you are using multiple models in parallel
I am going to try llama-swap, LM Studio and Ollama for this next and compare them. There are a ton of variants of models, weight formats and quantizations, which need benchmarking
I have been using unquantized original safetensors until now, which already gave me the ability to run ~5 parallel generations in my hardware
So if I am going to try LM Studio, I would rather use the bf16 ggml-org/gemma-4-E4B-it-GGUF instead of anything smaller --- because there is no point in nerfing an already smol model if your hardware can run 5 parallel sessions on the unquantized version
Will also release vibe reports and benchmarks on all this with @mervenoyann later this week
I would like to hear your thoughts if you have already tried these models on OpenClaw
garrytan
garrytan @garrytan
Retweeted
T Wolf 🌁 T Wolf 🌁
Tom Steyer invested in the private prison that ICE now uses for undocumented immigrants and the California Teachers Union just endorsed him for Governor. This is California politics today. 🤦‍♂️
Mike Netter: The Teachers Union Backs Steyer for Governor And the Hypocrisy Is Stunning
California’s largest teachers’ union has made its choice for governor, and it’s a doozy. The California Teachers Association just yanked its endorsement from Eric Swalwell after explosive rape allegations
garrytan
garrytan @garrytan
Retweeted
lucas lucas
얇은 하네스, 두터운 스킬.. AI 생산성 격차의 실제 이유
Garry Tan의 아티클에는 개인적으로 정말 공감가는 내용이 많았어요. 지난번 Claude Code 소스 코드를 저도 비슷한 관점으로 바라봤었거든요.
같은 모델을 사용하면서도 어떤 사람은 2배, 어떤 사람은 100배 더 생산적이죠. 차이는 모델의 지능이 아니예요. 모델을 감싸는 구조, 하네스에 있어요.
비밀은 모델에 있니? 아니요 모델을 감싸는 것에 있어요.
Thin Harness, Fat Skills...

"하네스가 곧 제품입니다"
하네스는 LLM을 실행하는 프로그램입니다. 루프 안에서 모델을 돌리고, 파일을 읽고 쓰며, 컨텍스트를 관리하고, 안전을 담보합니다. 그게 전부예요.
문제는 많은 사람들이 하네스를 두껍게 만든다는 데 있습니다. 40개가 넘는 툴 정의가 컨텍스트 창의 절반을 잡아먹고, 2~5초씩 걸리는 MCP 라운드트립이 쌓이며, REST API 엔드포인트마다 별도의 툴을 붙입니다.
결과는 = 토큰 낭비, 지연, 높은 실패율..
올바른 방향은 반대입니다. 하네스는 얇게 유지하고, 대신 스킬을 두텁게 만들어야 합니다.

이 아키텍처를 이해하기 위한 다섯 가지 개념이 뒤따라 나옵니다.
1. 스킬 파일
스킬 파일은 모델에게 "무엇을"이 아닌 "어떻게"를 가르치는 재사용 가능한 마크다운 문서죠. 중요한 점은 스킬 파일이 메서드 호출처럼 작동한다는 점. 같은 스킬, 같은 절차, 다른 세계.
2. 하네스
얇아야 합니다. 모델을 루프로 돌리고, 파일을 다루고, 컨텍스트를 정리하며, 안전을 지키는 것. 그 이상은 불필요합니다.
3. 리졸버
리졸버는 컨텍스트의 라우팅 테이블입니다. 어떤 작업 유형이 감지되면, 어떤 문서를 먼저 로드할지 결정합니다. 개발자가 프롬프트를 수정하려 할 때, 리졸버가 EVALS.md 를 먼저 불러옵니다.
그 문서에는 정확도가 2% 이상 떨어지면 되돌리라고 적혀 있죠. 개발자는 평가 스위트의 존재조차 몰랐을 수 있지만, 리졸버는 적절한 순간에 올바른 맥락을 가져옵니다.
4. 잠재적 vs. 결정론적
모든 작업 단계는 둘 중 하나에 속합니다. 잠재적 공간은 판단, 해석, 패턴 인식이 이루어지는 곳입니다.
결정론적 공간은 같은 입력이 같은 출력을 보장하는 곳으로, SQL과 컴파일된 코드가 여기에 해당합니다.
가장 흔한 실수는 결정론적 문제를 잠재적 공간에 밀어 넣는 것입니다.
5. 다이어리제이션
Diarization.. 이건 모델이 주제에 관한 모든 것을 읽고 구조화된 프로파일로 압축하는 과정입니다.
SQL 쿼리나 RAG 파이프라인은 이 작업을 대신할 수 없죠.
수십 개의 문서를 실제로 읽고, 모순을 파악하고, 변화를 추적하여 지적 요약을 만들어내는 것.
이것이 AI를 진정한 지식 작업에 활용하는 방법입니다.

"스킬은 영구적인 업그레이드입니다"
그가 AI에게 내린 지시가 있었습니다.
> "일회성 작업은 하지 마세요. 같은 종류의 요청이 반복될 것 같다면, 처음 몇 번은 수동으로 하고 결과를 보여주세요. 승인이 나면 스킬 파일로 만드세요. 자동화가 필요하면 크론으로 등록하세요. 같은 요청을 두 번 받았다면, 실패한 것입니다."
이 지시가 1,000개의 좋아요와 2,500개의 북마크를 받았을 때, 많은 사람들은 프롬프트 엔지니어링 기법으로 이해했죠. 그런데 실제로는 아키텍처 원칙입니다.
작성된 모든 스킬은 시스템의 영구적인 업그레이드입니다. 저하되지 않고, 잊지 않으며, 새벽 3시에도 실행됩니다.
그리고 새로운 모델이 출시될 때마다 모든 스킬이 자동으로 개선됩니다. 결정론적 단계는 완벽히 신뢰할 수 있는 채로 유지되면서, 잠재적 단계의 판단력이 높아지기 때문입니다.
100배 생산성의 비밀은 더 똑똑한 모델이 아닙니다.
두터운 스킬, 얇은 하네스.. 여기에 모든 것을 코드화하는 규율입니다.
시스템은 복리처럼 쌓입니다. 한 번 만들면 영원히 실행됩니다.
Garry Tan: http://x.com/i/article/2042922188924424198
ylecun
ylecun @ylecun
Retweeted
Pessimists Archive Pessimists Archive
Indeed.
Garry Kasparov: Indeed. The history of tech impact on labor is well-documented, including by those named. It's unpredictable, but usually improves productivity and leads to expansion. Law & white-collar workers aren't horse-buggy drivers or elevator operators. They will use AI and adapt.
petergyang
petergyang @petergyang
Huh? Is this real?

CHOI: 🚨 BREAKING: OpenAI just shadow-dropped a massive GPT Pro update. And it is completely slaughtering Claude Opus 4.7 in frontend coding.

No official announcement. No release notes. But the performance gap is suddenly staggering.
We just ran a head-to-head benchmark across GPT


ylecun
ylecun @ylecun
Retweeted
Pessimists Archive Pessimists Archive
The two most repetitive unfounded concerns about technology in history are:
- Will it cause mass unemployment?
- Will it ruin a generation?
You don’t get to just say “it’s different this time” and move on. It’s always different each time. How is it the same? That gives clarity.
Yann LeCun: Dario is wrong.
He knows absolutely nothing about the effects of technological revolutions on the labor market.
Don't listen to him, Sam, Yoshua, Geoff, or me on this topic.
Listen to economists who have spent their career studying this, like @Ph_Aghion , @erikbryn ,
petergyang
petergyang @petergyang
Went from having multiple terminals open to just two apps open :)
swyx
swyx @swyx
Retweeted
AI Engineer AI Engineer
🆕 The Future of MCP
https://www.youtube.com/watch?v=v3Fr2JR47KA
London is the home of MCP, which is just a little over a year old and now the most successful AI integration protocol ever! @dsp_'s keynote recaps the past year and milestones, but also reintroduces what you can do with MCP (e.g. MCP Apps cc @idosal1 @liadyosef), and contrasts vs Skills and CLIs. It is still very early, and upcoming work is coming in progressive discovery (tool search cc @mattzcarey) and programmatic tool calling (code mode @threepointone).
If we learned anything on the ground at AIE Europe, it is that MCP is actually far more foundational and widely adopted than even -we- appreciated!
rauchg
rauchg @rauchg
Retweeted
Vercel Vercel
Re Our investigation has revealed that the incident originated from a third-party AI tool with hundreds of users whose Google Workspace OAuth app was compromised.
We recommend that Google Workspace Administrators check for usage of this app immediately. https://vercel.com/kb/bulletin/vercel-april-2026-security-incident#indicators-of-compromise-iocs
garrytan
garrytan @garrytan
Retweeted
Vivi Vivi
"Imagine if naked people were stupider. It turns out, naked models actually are."
That's @garrytan opening an essay about AI architecture — and yes, he means the other kind of naked model. But the disappointment you just felt? That's exactly how most people feel when they try a raw LLM and watch it fail.
This is the most important distinction in AI right now: naked models vs. dressed ones.
A naked model is an LLM with no guardrails — no skill files, no routing, no deterministic tools handling the parts that need precision. You type a request, it hallucinates, and you conclude AI doesn't work.
A dressed model is the same LLM wrapped in architecture — harnesses that manage context, resolvers that route tasks, and deterministic code doing the heavy lifting where reliability matters. Same brain, completely different results.
Garry's metaphor is perfect: judging AI by its naked model is like testing an engine on a bench and concluding cars are unsafe.
Kyle Kingsbury — the engineer who spent a decade proving databases lied about their consistency guarantees — wrote a brilliant 32-page takedown of LLMs. Every failure he documented is real. But he was testing naked models. The people not having his problems? They built the clothes.
This is exactly what I see with AI founders across APAC: the winners aren't chasing better models. They're building better wardrobes — the architecture, local knowledge, and system design that make unreliable models produce reliable outcomes.
The real question isn't "can you trust the model?" It's "can you dress it well enough that it doesn't matter?" 😉
Garry Tan: http://x.com/i/article/2045399189606273024
garrytan
garrytan @garrytan
Retweeted
Gregor Zunic Gregor Zunic
2k ⭐️ in 24 hours
Gregor Zunic: Introducing: Browser Harness. A self-healing harness that can complete virtually any browser task. ♞
We got tired of browser frameworks restricting the LLM. So we removed the framework.
> Self-healing — edits helpers. py on the fly
> Direct CDP — one websocket to Chrome
> No
mattshumer_
mattshumer_ @mattshumer_
Am I the only one having a good experience with Opus 4.7?

I still vastly prefer Codex for most things but Opus is absolutely nailing every UI task I give it.
garrytan
garrytan @garrytan
Retweeted
Chrys Bader Chrys Bader
the 5 stages of ai grief
since Claude Design launched, designers are grappling with the same existential recoil as when engineers first saw ai could code. the process maps to the stages of grief.
1. denial.
"but design is more than just producing designs." engineers said the same thing. "coding is more than just writing code." both true.
2. anger.
look how bad the output is. look at the people shipping slop. look at the execs who don't understand what we actually do.
3. bargaining.
it's just a tool. i'll use it for the boring parts and focus on the strategic work. the craft is safe if i stay in charge of it.
4. depression.
i can't believe i used to do all of this by hand. all those hours. all that time.
5. acceptance.
i understand the nuance better than ever. i'm still the architect. and now i can actually build the thing.
as a software engineer and designer of 25+ years, i've watched this cycle from both sides. the designers grieving now are where engineers were 18 months ago.
when our core competency is threatened, we’re quick to defend what’s unique about it, romanticize it, and dig our heels in. what follow is a process of assimilation.
i believe designers will eventually see Figma as an awfully archaic and cumbersome way to explore ideas. most designs already become interactive prototypes, so we'll just get there faster. much faster.
in the end, taste and judgment is still what remains. creating successful work ultimately breaks down to a series of choices that add up to net value creation.
those who win will continue to be involved in the most important choice-making, with a keen ability to discern between what choices are important for the human to make.
think slow, move fast.
ylecun
ylecun @ylecun
Retweeted
Viral Reel Addict Viral Reel Addict
This video was deleted yesterday from facebook.
Be a real shame if everyone reposted this.
gdb
gdb @gdb
codex is becoming the universal app for developers:

Peter Yang: Went from having multiple terminals open to just two apps open :)

swyx
swyx @swyx
Retweeted
Benjamin Morris Benjamin Morris
That chances of a 1/n event happening in n tries converges quickly on 63% (1-1/e to be exact) has been one of my favorite useful real world math shortcuts since high school.
petergyang
petergyang @petergyang
I switched my openclaw to gpt finally and it’s…not going to well.

Any tips to optimize it for this model?
garrytan
garrytan @garrytan
Peter you literally inspired me to do it

Now we need to get everyone up to 100x to 500x speed

Peter Steinberger 🦞: @garrytan You’re shipping harder than I do these days!
garrytan
garrytan @garrytan
Retweeted
Adrián Treviño Adrián Treviño
Re @garrytan Honestly since Claude code and @garrytan gbrain and gstack I work harder, faster and more hours than ever before even though I “have” to do less things.
The trippy thing is I “can” do more things and every hour not working feels like a week of work in 2022 timeline.

YouTube

0

No recent videos fetched on this date.