How to drive the billionaires out and ruin the California tax base in one fell swoop: propose asset seizure measures
Make it make sense! Middle class taxpayers will take on all the billions in tax revenue lost
Blake Byers: How the 5% California wealth tax is a 67% wealth tax for Sergey Brin:
>Owns 3% of GOOG.
>Holds 25.3% of voting rights
>Wealth tax is assessed as the greater of his ownership or voting rights. So his tax is 5% * his 25.3% voting ownership = 1.27% of the value of GOOG.
>1.27%
Feels so good when you have a breakthrough with Codex
Re now there is a mac app for it https://github.com/darrylmorley/whatcable
I spent close to $3,000 for a Macbook Pro so that I can try running local models.
At least, that was my excuse. In reality, I am running...😅
Ethan Mollick
I was quoted a couple times in this Atlantic article, but that isn’t (the only) reason I think it is good. It lays out the reasons why we whipsawed from “AI is a bubble” to “there are not enough data centers” in less than six months. Spoiler: its agents. https://www.theatlantic.com/economy/2026/05/ai-bubble-revenue-anthropic/687022/
man its good to be back on twitter
there is comfort in the skills of a wasted youth
/hatch clippy
AI Engineer
People are really enjoying our full workshops showing end to end walkthroughs of real production workflows!
This is a rare double header with @braintrust's Giran Moodley and @OussamaHaff walking though the real life AI engineering behind @thetrainline, Europe's #1 most downloaded rail app with 27m MAU and £5.3B in ticket sales!
the workshop bundles several important lessons:
- break down monolithic LLM calls into specialized stages (e.g., triage, policy review, and reply generation)
- how to monitor latency, token usage, and costs effectively with end-to-end tracing of agentic flows
- using "golden sets" (a curated set of test inputs) to identify failure modes
- how to move from local development to a managed environment where prompts and scoring functions are version-controlled
- how to allow non-technical team members to collaborate and update model parameters without code changes
- how to identify production regressions, replay failures, and apply targeted fixes to improve system reliability continuously
enjoy!
Braintrust: Watch here → https://braintrustdata.link/AI-engineer-session
First broken Codex feature I've come across
Zaid Jilani
Why is Randy Fine branching out in his racism, did an Armenian buffet kick him out because he violated the three hour time limit?
ANCA: “We don’t want Armenians to be able to serve in Congress.”
The ANCA condemns this racist anti-Armenian rant by US Rep. Randy Fine (R-FL), cosponsor of a reckless Congressional resolution to ship US arms and aid to genocidal Azerbaijan
Jared Friedman
Software engineering job descriptions should really start saying whether they include /fast mode or not.
Jared Friedman
Datacenters in space are making more and more sense.
Garry Tan: Per one-gigawatt data center complex: 5,322 permanent jobs, $157M per year in state taxes, $248M per year in local taxes. During construction: $2.67B in combined investment
But nobody managed to tell Seattle this, so it's banned.
we will plan bigger parties for future releases.
a lot more people wanted to come than we expected. thank you!
gonna try to think of a really good idea for the next one.
Steven Pu
Been reading through gbrain code for a few days. I may be late to the party but this looks like a new class of AI-native apps.
What differentiates an AI-native app vs. an app that uses AI? Here's what I surmised from gbrain.
Nick Davidov
I love California, but If the CA wealth tax passes I’m likely to leave. I’m not anywhere close to a billionaire but none of the taxes our companies or family pays would go to support this lunacy. I think I won’t be alone. Bankruptcy and austerity might actually be better for California in long term even though it will hurt a lot of people not deserving this in the short term. People who turn on their cars praying not to see a check engine light. While bureaucrats throw billions of public money around on waste, fraud, and destroying the markets.
𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰
Stanford's latest seminar is a deep dive into the evolution of world modeling in AI.
Focuses on the shift in the world model from traditional reconstruction methods toward latent space prediction.
Covers topics like:
- Introduction to JEPA & World Models
- Causal JEPA
- LOWER Model
- Practical Applications & Planning
- Future Outlook
Timur Kuran
So appalling one doesn’t know where to start. Taxpayer-funded public schools have no business meddling in politics, turning kids into activists, or wasting classroom time on causes that serve teachers—not students.
Corey A. DeAngelis, school choice evangelist: "We are organizing our school so that the kindergarten through 6th grade students are also going to be able to march."
request for chrome extension that augments all image input boxes on the web:
- lets me generate a simple word text thing (no ai) OR
- draw something with @tldraw (no ai) OR
- use either words or drawings to generate something of the required proportions
@devinai do it pls
Jesse Proudman
Had dinner tonight with 8 entrepreneurial couples. Every single
one is moving.
Good bye @MayorofSeattle.
You’re the nail in the coffin of Seattle and you will be memorialized for it.
Raouf Chebri
What's new this week on Replit
- Replit turns 10 🎉and Agent is free for everyone on May 2
- App Monitoring with Agent: real-time uptime checks and downtime alerts on every paid plan
- Build full slide decks with Agent and export to PPTX, Google Slides, or PDF
Replit, turned 10 🎂
To celebrate we’re making it totally free for 24 hours starting at 5am PT.
But our work—to make coding accessible for all—goes back to 2011.
Watch the highlights from the journey:
It’s been an honor to help millions learn & ship. Here is to the next 10!
Amjad Masad
Replit, turned 10 🎂
To celebrate we’re making it totally free for 24 hours starting at 5am PT.
But our work—to make coding accessible for all—goes back to 2011.
Watch the highlights from the journey:
It’s been an honor to help millions learn & ship. Here is to the next 10!
Amjad Masad
Replit, turned 10 🎂
To celebrate we’re making it totally free for 24 hours starting at 5am PT.
But our work—to make coding accessible for all—goes back to 2011.
Watch the highlights from the journey:
It’s been an honor to help millions learn & ship. Here is to the next 10!
💥Susan Dyer Reynolds🗞️
“Supervisor Jackie Fielder, who is currently on a leave of absence following a personal health crisis, was also in attendance.”
Wait. WHAT? @RafaelMandelman if she’s well enough to attend an airport protest she’s well enough to work. https://missionlocal.org/2026/05/s-f-supervisors-past-and-present-arrested-at-sfo-anti-ice-protest/
Captain Insight
Neural networks were declared scientifically dead in 1987.
A French PhD student bet his entire career on them anyway ~ and won. 🤯
>Meet Yann LeCun 🇫🇷
>Paris-born. PhD from Sorbonne in 1987.
>Joined Bell Labs in 1988. Kept building. Alone.
>In 1989, built Convolutional Neural Networks
>By the late 90s, his CNN was reading 10% of US bank checks
>The industry called it a niche trick. Ignored him for over a decade.
>Then 2012 hit. Deep learning exploded.
>His “dead” research became the blueprint for everything
> ChatGPT. Gemini. Claude. Grok. All standing on his shoulders. 🚀
>Won the 2018 Turing Award ~ computing’s Nobel Prize
>Became Chief AI Scientist at Meta “Godfather of AI.”
>Now publicly says LLMs are a dead end.
>Fights the entire industry.
>Left Meta in late 2025 to build AMI Labs in Paris
>Already valued at $3.5B before launching. World models, not LLMs.
The industry ignored him for over 20 years.
Now he’s ignoring the industry.
Absolute Legend 🐐
Maryam
2022: “Stop overreacting, they won’t overturn Roe.”
They did.
2023: “Stop overreacting, they won’t let women die rather than get an abortion.”
They did.
2024: “Stop overreacting, they won’t arrest women for miscarriages.”
They did.
2025: “Stop overreacting, they won’t turn women into incubators.”
They did.
2026: “Stop overreacting, they won’t attack mifepristone.”
They did, today.
Now: “Stop overreacting, they won’t go after birth control next.”
They will.
Yuli Kay
You can build apps, games, websites, ANYTHING FOR FREE on http://replit.com 🤯
But you have only 24 HOURS starting 5 AM PST | 1 PM UTC
Check out what I've already built with @Replit
Replit ⠕: Replit Agent is free tomorrow for everyone starting at 5am PST
Show use what you can build in 24 hours
And Replit is turning10! A trip down the memory lane on what got us here
Replit ⠕
Countdown to Free Agent for 24 Hours + Buildathon Kickoff: $100K+ in Prizes https://x.com/i/broadcasts/1rxmqomNPNwxy
Francisco Cruz Mendoza
Huge shoutout to the @Replit engineering team for sticking around all night ahead of the 24 hour Replit 10 Buildathon 🎉
See you all live at 4:30amPT!
Jennie Littleton
Hour 1 of 24 for the @replit 10 year anniversary buildathon! Huge thanks to @amasad @raymmar @MannyBernabe @Franciscocrz for the incredible opportunity. I'll be working on http://getthew.app and a few side projects. 👀 livestream link dropping later
👉What are you building?
aviel
Ok, I’ve finally processed how I experienced this, and it’s a big deal. I grew up hearing stories from my father and grandfather about the Soviet Union, but this was the first time I truly understood them, like the difference between hearing stories about having a child and actually holding your own newborn. Like an ancestral alarm. It’s devastating. Everything in me is screaming to divest from the region that I’ve poured my adult life into to survive. The feeling of loss is immeasurable, and the casual “bye” just makes it worse. The damage from the lack of empathy here will create a cycle of attacks that take decades to undo and will bloody the hands of everyone around me, there are no sidelines in my line of work. This also isn’t about taxes, the cost of reorienting my life is infinitely greater. It’s now primal and existential.
Brandi Kruse: INSANE. Seattle's Socialist Mayor responds to exodus of wealth from Washington state by saying "BYE" ... then laughing. We're doomed.
Republicans against Trump
This is insane
Three Trump judicial nominees refused, over and over, to say Joe Biden won the 2020 election. They either believe Trump’s lie that the election was stolen, or they’re too afraid of him to tell the truth.
This isn’t about their political views. It’s about recognizing reality and being part of an independent judiciary
George Ohan
Re @Replit agent just worked 25 minutes for free.
One more attempt at #georgiejobsapp
Many such cases
Just Samuel: @garrytan Been using GBrain. It’s da best experience so far. 👍🏻
Dave Gambrill
Using @WisprFlow and @Replit I made this in like 20 minutes just by literally talking about it. If you aren't playing with these tools, you are missing out. You need zero tech ability to do this.
https://dave-tech-resource-hub.replit.app/
Bhargav Gajjar
My token usage has become very efficient after using GBrain @garrytan
Most people give AI one-line prompts and wonder why their app looks like slop.
My next guest, Ravi, has built a 3-layer context system that fixes this:
→ Functional: What the app does.
→ Visual: What the app looks like.
→ Data: How the data structure works.
The data layer is the most underrated and including it in your prompt lets you create much more flexible prototypes and apps.
📌 Subscribe to get our full episode tmr: https://www.youtube.com/@PeterYangYT?subscribe
Peter Yang
Most people give AI one-line prompts and wonder why their app looks like slop.
My next guest, Ravi, has built a 3-layer context system that fixes this:
→ Functional: What the app does.
→ Visual: What the app looks like.
→ Data: How the data structure works.
The data layer is the most underrated and including it in your prompt lets you create much more flexible prototypes and apps.
📌 Subscribe to get our full episode tmr: https://www.youtube.com/@PeterYangYT?subscribe
Robert Pondiscio
A valuable look at how politicians are stage managed by their handlers. Keeping the camera rolling and posting it publicly, awkward silences and all, is an act of civic hygiene.
Ari Hoffman: Staffers for Seattle Socialist Mayor Katie Wilson abruptly end an interview with KOMO News Senior Reporter Chris Daniels when she can't answer basic questions
Wilson has been criticized for dodging the press & being unable to answer basic questions since she came into office
Will Manidis
I don’t think any of you have processed at any level how widespread and profound the ai water libel is
My OpenClaw is going to have a very poor performance review this quarter
Professor Hamming called it: neural nets were the solution to the programming problem. Ilya had to point to scaling for it to work.
Jesse Proudman
I started my first company from my bedroom in Tacoma when I was 13, dreaming of one day building something like the tech companies I watched flourish in Seattle. I remember being awed that someone could create a company from nothing and I knew that's what I wanted to do with my life.
After 28 years of building, it's heartbreaking to watch Seattle's leaders shift from celebrating entrepreneurs to making clear we're the problem. The tax bill is just the price tag on their contempt.
https://www.foxnews.com/media/seattle-ai-founder-looks-leave-taxes-rise-everybody-i-know-process-leaving
Pejman Pour-Moezzi
Run gstack's /office-hours right in the web, no terminal needed!
Skillet uses Anthropic's new Managed Agents to spin up a Claude Agent SDK with skills installed exactly like Claude Code.
Perfect for non-technical people that don't want to mess with terminals.
Chat with @garrytan now (no signup or API keys needed): https://skilletweb.com/office-hours/new
Patrick Wolff just got endorsed by the SF Chronicle
Common sense is winning
https://www.sfchronicle.com/opinion/editorials/article/patrick-wolff-insurance-commissioner-california-22103717.php
Garry Tan: Californians can’t get proper insurance
Why? Because it’s been wholly mismanaged by machine politicians who aren’t very smart
What do we do about it? Elect someone smart who can fix it
That’s Patrick Wolff
Chief Nerd
Sam Altman Says CEO’s Who Talk About AI Taking Everyone’s Jobs Are ‘Tone Deaf’
“Someone said to me just yesterday that … GPT 5.5 in Codex can accomplish in an hour what would have taken me weeks two years ago … and I have never been busier in my life.”
Jennie Littleton
Hour 4 & 5 update of the @Replit 24 hour buildathon. Links below to the two live builds 👇
@raymmar @MannyBernabe check out the second link, it's a new buildathon timer site that I'm super excited to watch evolve 🐳
Eric Ries
The ability of an LLM to help readers make connections events from their own life is going to unlock a lot of interesting new forms of reading and - even better - understanding texts.
Here's a great example from @garrytan
Garry Tan: book-mirror is the flagship.
Hand it a book, get a personalized two-column analysis. Left shows the author's idea. Right maps every idea to your actual life using your own words from the brain.
Here's the example based on a yet-unreleased book by @ericries Incorruptible
It's very satisfying to get Codex or Claude Code to "marie kondo" your local files and Google Drive.
I give these apps full access to my computer and gws (google workspace cli), then prompt things like:
"Tell me what apps load on computer bootup. Give me a plan to clean this up."
"Look at my downloads folder. Give me a plan to clean up and organize it."
"Help me organize my Google Drive. Let's review your plan first before doing anything."
Note that I always ask it for a plan first. These are semi-dangerous operations so try them at your own risk.
Anyway, my files and Drive now spark joy 🤣
Peter Yang
It's very satisfying to get Codex or Claude Code to "marie kondo" your local files and Google Drive.
I give these apps full access to my computer and gws (google workspace cli), then prompt things like:
"Tell me what apps load on computer bootup. Give me a plan to clean this up."
"Look at my downloads folder. Give me a plan to clean up and organize it."
"Help me organize my Google Drive. Let's review your plan first before doing anything."
Note that I always ask it for a plan first. These are semi-dangerous operations so try them at your own risk.
Anyway, my files and Drive now spark joy 🤣
The Homebrew computer club phase of anything is the most fun
I am savoring it
Garry Tan: One note: GBrain is not batteries included. It is experimental and has rough edges. It, like OpenClaw, is a Ferrari that lets you experience insanely cool things but you better bring your wrench!
It will not be like that forever but for now it still in Homebrew Computer Club
It’s not gonna be void deer or boson cutter it turns out
Prem Makeig / premm.eth: @garrytan Some new job titles:
- Personal agent designer
- Second brain engineer
- Context editor
People are going to care a lot about their personal agents, and they will want help designing them.
Many such cases
snowblue: @garrytan It's been incredibly helpful, though. I'm not a coder, but openclaw has turned me into one. I'd started building something of my own that was similar when I came across GBrain. The scaffolding and principles it embodies has supercharged my efforts and I really appreciate it!
The Seattle Times
They say a gaffe is when a politician tells the truth. Seattle Mayor Katie Wilson saying "bye" to the wealthy upset about taxes is not the kind of truth Seattle needs right now, writes columnist Danny Westneat. https://www.seattletimes.com/seattle-news/politics/the-gaffes-are-becoming-a-pattern-for-seattles-new-mayor/?utm_medium=social&utm_campaign=owned_echobox_tw_m&utm_source=Twitter#Echobox=1777738154-2
TommyYipxyz
All the tasks seem to be coming in. Everything's going well. Excited to push it to deployment. Let's go at Replit. Happy birthday once more! Shout out to the support team for getting it under control. @ReplitSupport @Replit
Saganism
"If we are honest — and scientists have to be — we must admit that religion is a jumble of false assertions, with no basis in reality. The very idea of God is a product of the human imagination. It is quite understandable why primitive people, who were so much more exposed to the overpowering forces of nature than we are today, should have personified these forces in fear and trembling. But nowadays, when we understand so many natural processes, we have no need for such solutions. I can't for the life of me see how the postulate of an Almighty God helps us in any way."
— Paul Dirac, Remarks made during the Fifth Solvay International Conference
Austen Allred
Hahahaha I love this app
Martin Shkreli: Ro pretends he is a modest and humble guy. Over the course of the next few months, I will reveal much more about Ro Khanna. He's just a rich guy lying to everyone about virtually everything.
At the end of this, I predict his wife will file for divorce.
Shaun Willis
i'm buzzing. this is about the most insane productivity boost in my life that i've ever experienced. @Replit full throttle is by far the greatest thing in software right now. I can't believe there is not more people on the app right now building
Shaun Willis
Re @replit building in replit right now feels like this
gallery for codex pet sharing:
Hunter ♠️: Built Petdex, a public gallery to discover, share, and install Codex pets with one curl.
Submissions open at link below 👇
Shout out to @replit engineers and support team keeping everything together as users run armies of agents building everything they ever dreamed of 😅
Shaun Willis: @replit building in replit right now feels like this
22 ACTIVE PARALLEL AGENTS…. and 13 in draft 😭
Shaun Willis: Full steam ahead! @Replit
Browser Use
We're creating SKILL files for all websites
Contribute a domain skill to browser-harness
Saurav Panda: domain skills are the most fun PRs i've ever merged.
you don't hand-write them. the agent does the task in your browser, figures out the selectors and edge cases, and writes the skill itself. you just open the PR.
linkedin, amazon, expenses, whatever you do daily - contribute
this is great
Boaz Barak: My colleagues have been posting so many cool research results on the @OpenAI alignment blog! A few examples in 🧵
https://alignment.openai.com/
5.5 xhigh in fast mode is
really good
i think i got psyoped by twitter on medium for a bit
never thought id be watching F1 via the kids broadcast
cannot imagine being happier
codex for improving your ergonomics
jason liu: With codex I don’t need a second monitor I turned it into a standing desk
https://youtu.be/kYkIdXwW2AE?si=hV2ANEl-wPh1MSU1
Ron Alfa
Loved the vibes with @latentspacepod, was a lot of fun.
Latent.Space: 🔬 Training Transformers to solve 95% failure rate of Cancer Trials
the AI for Science pod is back with @RonAlfa, CEO of @NOETIK_ai, and Daniel Bear, VP Research at Noetik, explaining exactly how their team of top AI x Bio researchers and engineers (shoutout @owl_posting) will
i keep thinking i want the models to be cheaper/faster more than i want them to be smarter
but it seems that just being smarter is still the most important thing
Sam Altman
Re @hsu_steve mogging
This is correct
Hugo Amsellem: http://x.com/i/article/2049920112707137536
Lisan al Gaib
I think returns to intelligence are nonlinear because decisions are path-dependent
early choices in code, experiments, or strategy can compound positively or negatively over time
for example by avoiding dead ends or preserving optionality
it's why I am a big fan of very long running tasks and massive benchmarking budgets
GPT-5.5 and Mythos Preview are only marginally more intelligent than previous models and have pretty much the same performance up to 10M tokens, but after that they go absolutely ballistic
Sam Altman: i keep thinking i want the models to be cheaper/faster more than i want them to be smarter
but it seems that just being smarter is still the most important thing
“Prompt” took on an entirely new meaning but somehow many things stayed the same.
Amjad Masad: Replit, turned 10 🎂
To celebrate we’re making it totally free for 24 hours starting at 5am PT.
But our work—to make coding accessible for all—goes back to 2011.
Watch the highlights from the journey:
It’s been an honor to help millions learn & ship. Here is to the next 10!
Danielle Fong 🔆
Protesters shut down Berkeley Forum event hosting @jeffdean is some emblematic circular firing squad stuff come on. Jeff has been out spoken about human rights repeatedly, but instead of even engaging i guess the whole event was shut down. typical!! https://www.dailycal.org/news/campus/protesters-shut-down-berkeley-forum-event-hosting-google-ai-scientist/article_9dd82646-3c37-48b5-8dd4-61a5050646ce.html?utm_medium=social&utm_source=twitter&utm_campaign=user-share
Bexly
Coined this last year: @garrytan wields the OG switch
“The great inversion is almost here.
More non-technical people exist than traditionally “technical”
Understand this simple economic factor. Your only job is to facilitate this reality faster by way of niche distribution”
Hugo Amsellem: http://x.com/i/article/2049920112707137536
Vox
gbrain 0.25.1 shipped a feature i think is genuinely powerful.
feed your openclaw / hermes a book you've been reading, the agent uses the real you in your brain to map every chapter to what you're actually working on.
drop in Atomic Habits and the agent maps every chapter against your brain's actual reflections on your morning routine, writing, running streak. reads like a therapist who's been reading your notes, scribbling in the margins.
this is where long-term logging compounds. the more complete your brain, the more the book reads you.
those dozen unfinished books finally have a reason to come back out
Garry Tan: GBrain v0.25.1 now ships with the book-mirror skillpack by default. Yes, you can upload an epub and if your brain is full of knowledge about you, it'll relate each idea to something you are working on, care about, or are thinking about.
Whoever at Android forced this mess of how work profiles and personal profiles work together (really they don’t) should have been fired
It’s a textbook example of PMs making the wrong call and letting the bad decision fester for years
Noah Smith 🐇🇺🇸🇺🇦🇹🇼
I'm in favor of taxing the ultra-rich. But California's "billionaire tax" is a poorly-designed piece of slopulism.
https://www.noahpinion.blog/p/californias-billionaire-tax-is-the
Gatis
I’m on @Replit Pro plan now, parallel task is very impressive, after almost 12 hour in it’s working like 🔥 @Franciscocrz
Kane 謝凱堯
Silicon Valley congressman @RoKhanna is trying to impose more taxes on Californians who made their wealth instead of getting it the Correct way like him: being handed it by family and trading on congressional insider information.
Arthur MacWaters: > be Ro Khanna
> #2 most active “trader” in Congress
> literally $600m in trades
> net worth 10s-100s of millions
> “it’s not my money it’s my wife’s”
>…
> “tax the billionaires”
> asset seizure with explicit direct path to every citizen
> already caused the largest wealth
There are no good bagels on the UWS.
If someone opens a solid shop, they’re gonna make a killing.
小盖
强烈推荐大家看看DeepMind CEO Demis的最新判断。
真的,Google DeepMind 的 CEO Demis Hassabis 每一期访谈我觉得值得都花时间看看。这哥们讲东西很实在,而且通俗易懂。
早上边跑步边听完了他和 YC CEO Garry Tan 的最新一期播客。
刚刚把笔记写完,也给大家分享下。
多说一句,好多人问我这种笔记是不是 AI 写的。我说下自己的流程。
我会先完整听完播客,然后用语音输入法把感触尽量充分地讲出来,再让 AI 帮着整理初稿,最后自己逐字修改优化。
如果全部交给 AI 做总结,那等于把思考和理解的能力让渡给了 AI,对自己理解这件事其实没有任何价值。
OK,咱们进正题。
1
Demis 的态度非常明确,现在的大模型范式(大规模预训练 + RLHF + CoT)一定会是 AGI 最终架构的一部分,他不认为这会是条死路。
但要实现 AGI,还有几个关键问题要解决。这几个问题包括:持续学习、长程推理和记忆系统。
先从最容易看到的现象讲起,Context Window。
现在大模型处理长信息,最常用的招就是把 Context Window 一直撑大。一开始 8k,后来 32k,再后来 100 万 Token。听起来很厉害,但本质上是暴力堆砌。
Context Window 其实就相当于人脑里的 Working Memory,工作记忆。人的工作记忆能同时装多少东西?心理学里有个经典数字,7 个左右。背电话号码能记住 7 位上下,再多就溢出了。
大模型呢?已经做到 100 万 Token。
按理说,模型的工作记忆比人大几十万倍,应该比人聪明几十万倍才对。但显然不是。
问题也恰恰就出现在这。把所有东西都塞进 Context Window 里,里面包含了不重要的东西、错的东西、过时的东西。看起来信息很多,其实是一团乱麻。
那人为什么 7 个数字的工作记忆就够用?
因为人脑背后还有另一套机制在工作。我们记得几年前的事,记得童年的事,记得几小时前发生的事。这些都不塞在工作记忆里,而是另一套系统。
具体来说这套系统是海马体,大脑里负责把新知识整合进已有知识库的那个部分。
研究发现,人睡觉的时候,特别是 REM 睡眠阶段,大脑会重放白天重要的片段,让大脑从中学习。新东西在睡觉的过程里,温柔地融进了旧的知识体系。
这个把新东西融进旧知识库的过程,就是持续学习。
模型现在没有这套机制。每一次对话结束,刚学到的东西就会忘记。下次重新打开,还是上次那个模型,没长进。
2
再聊聊长程推理的问题。英文表达是 Long-term Reasoning。我翻译为了长程。
长程推理这个词太抽象了。Demis 讲了一个特别具体的故事,听完会立刻明白他说的是什么。
他说自己喜欢跟 Gemini 下国际象棋。下棋的过程里能看到模型的 thinking trace,也就是它在那里到底想了什么。
然后他发现一件怪事。
模型考虑一步棋的时候,思考链里清清楚楚写着,这步是个昏招。但接下来,它没找到更好的走法,于是又走回这步昏招。
明明知道是错的,还是把错的那一步走出去了。
这个细节比任何 benchmark 数据都说明问题。因为它暴露的是模型缺少对自己思考过程的某种内省能力。
正常人下棋,意识到一步是昏招之后,脑子里会有一个反应,停一下,再想想。停一下、再想想这个能力,模型现在没有。它能在每一步局部判断对错,但没法基于整盘棋的局势去调整整体策略。
这就是长程推理还没搞定的样子。模型可以一步一步往前走,每一步看起来都合理,但走到后面整盘棋的方向其实是错的。它没有那种退回到当前思考的上一层、重新审视一下的能力。
说到底,模型缺的是一种内省。
3
学习、长程推理、记忆,这是 Demis 在播客里点出来的三个 AGI 鸿沟。
除此之外,他还反复提到了创造力。
2016 年 AlphaGo 跟李世石下棋,第二局走出了著名的 Move 37。那一步棋走出来的瞬间,全世界的围棋高手都看呆了。
所有人类几千年下围棋积累的经验都告诉它不该下那里,但 AlphaGo 下了。下完之后大家发现,是一步神来之笔。
很多人觉得,这就是 AI 的创造力来了。
但 Demis 说,对他自己来说,Move 37 只是起点。他真正想看到的是另一件事。AI 能不能发明围棋这件事本身。
这两件事的区别非常关键。
Move 37 是在围棋这个现成的规则里,找到了一步人类没想到的招。但围棋的规则、棋盘的形状、黑白子的对弈方式,是人类发明出来的。AI 在已有的框架里非常厉害,但能不能自己造一个框架,是另外一回事。
Demis 给了一个具体的设想。
如果给 AI 一个高层次的描述。造一个游戏,五分钟能学会规则,要好几辈子才能精通,棋局有审美,一下午能下完一局。AI 能不能根据这个描述,自己倒推出围棋?
目前做不到。
为了把这件事讲得更清楚,Demis 还提了一个测试,他自己叫爱因斯坦测试。
用 1901 年人类已有的全部知识训练一个模型,看它能不能在 1905 年那个时间点,自己推出狭义相对论。
爱因斯坦在 1905 年那一年里,连写了几篇改变物理学的论文,后来叫爱因斯坦奇迹年。那些工作不是从已有的物理学论文里通过拼接得到的,是基于已有材料做了一次全新的概念跳跃。
爱因斯坦测试想问的就是这件事。AI 能不能做这种跳跃。
目前的大模型主要在做两件事,pattern matching 和 extrapolation。一个是从大量数据里找规律,一个是把规律往外延伸一点。但发现新东西需要的是类比推理的能力。从一个领域里抽出深层结构,搬到另一个全新的领域去用。
这个能力,模型现在还没有。也可能是有,但用法不对所以激发不出来。
4
除此之外,Demis 还分享了一个让我特别出乎意料的判断,他说未来 6 到 12 个月,真正的价值不在更大的模型,在更小的模型。
这一部分内容我反复听了好几次,确实突破我的已有认知。
不知道大家的想法,反正我自己,这一年来并没有怎么关注小模型的进展。毕竟行业的焦点就是把模型做大嘛。
那小模型的价值到底在哪?
最直接的是成本。同样一个任务,小模型的推理价格可能只是前沿模型的十分之一甚至更少。
但 Demis 说,比成本更重要的其实是速度。
这里有一个前提得先说清楚。Demis 不是在说速度可以替代智能。
他的原话是,当小模型的能力已经达到前沿模型的 90% 到 95%,也就是已经相当不错的时候,剩下那 5% 到 10% 的能力差距,比不上速度带来的好处。
比如现在工程师用 AI 写代码,已经形成了一种新的工作节奏。一个想法冒出来,几秒之内就能看到结果,不行就改,再不行再改。
这个一改再改的循环跑得越快,做出来的东西就越好。如果每次调用都要等十秒,整个工作流就被打断了。
更关键的是,快到一定程度,工程师在这种节奏里能进入心流。一个想法、一次尝试、一个反馈、再来一个想法,思维不被打断。
这件事写过代码的人都懂,进入心流和频繁掉出心流,产出的差距是数量级的。
Agent 也是同样的逻辑。一个 Agent 跑完一个任务可能要调几十次模型,每次慢一秒,整个任务就慢一分钟。慢到一定程度,Agent 就从一个能用的东西变成鸡肋。
小模型不是大模型的廉价替代品。有些事只有小模型能做。
比如手机、眼镜、家用机器人,需要的就是一个能在本地跑起来的模型。本地跑除了反应快,还有一个特别重要的好处,隐私。
家里机器人看到的视频、听到的对话,全部在设备本地处理,根本不上云。这件事对很多用户来说不是加分项,是底线。
成本、速度、边缘部署,这是小模型的价值。
5
讲完小模型的价值,接下来一个更关键的问题是,能力被压到这么小的参数里,会不会有上限?
Demis 的判断是,目前没看到信息密度有任何理论上限。小模型的智能天花板还远没看到。
支撑这个判断的,是 DeepMind 在蒸馏这件事上的积累。蒸馏简单说就是先训练一个超大的模型,然后用这个超大模型去教一个小模型。教完之后,小模型用极少的参数,能复现原来 95% 以上的能力。
为什么 DeepMind 这么重视蒸馏?因为要把 AI 能力放进谷歌的头部产品中,前提是低延迟、低成本。前沿模型再强,每次推理花几秒钟、花几毛钱...这条路,恐怕很难走得通。
一个前沿模型发布之后,6 到 12 个月内,他们就能把这个模型的能力蒸馏到边缘设备能跑的小模型上去。这个时间表比很多人想的要快。
在很多场景中,小模型和大模型会相互配合。
举个例子,一个端到端的智能助手,绝大部分日常任务在本地的小模型上跑。智能眼镜看到的画面、家里机器人听到的对话、手机里的私人助理,模型直接在设备里读懂,不需要往云端传一遍。
只有遇到特别复杂、本地搞不定的问题,才向云端的前沿模型发起请求。
也就是说小模型在边缘做主力,前沿模型在云端做后援。
不过,这个构想对小模型的要求也比较高,它不能只会处理文字,还得能理解物理世界。
这就是为什么 Gemini 从一开始就坚持多模态,不光处理文字,也处理图像、视频、声音。
一开始这么做比只做文本要难得多,但眼镜也好,机器人也好,需要的是一个能看懂周围世界的模型,不是一个只会聊天的模型。
讲到这里,小模型这条路的轮廓就完全清楚了。它独立成立,不是前沿模型的廉价替代品,而是另一条同样重要的路。
嗯,很有启发。