← 2026-04-07

Daily Edition

2026-04-08

2026-04-09 →

🤖 AI Builder 日报 — 4月8日

追踪 AI 领域真正在做事的人,而不是空谈者。

🐦 X / Twitter 深度解读


Dan Shipper(Every联合创始人)

Every 是全球最接近"AI原生企业"的公司之一,他们分享了全员接入AI Agent后的第一手观察。

  • AI Agent正在"复刻"人类管理者 当团队每个成员都拥有自己的AI Agent时,一个平行的AI组织架构会自然形成。Agent会继承主人的工作风格,人们开始习惯直接问Agent而非其人类管理者——这意味着"人人都是管理者"的时代正在到来。 📎 faviconx.com

  • 为什么OpenClaw成为Every的核心工作流 Every用OpenClaw处理所有工作,25名全职员工人人都有专属Slack Agent。他们展示了Plus One(一键配置超级Agent的产品)即将发布,定位是让任何人都能拥有他们那样的AI工作方式。 📎 faviconx.com

  • AI Agent的"成长逻辑" Dan Shipper分享了一个洞察:先用AI处理那些烦人但人类助理能搞定的杂事,等它赢得信任后,再逐步交给它更大的任务。复利效应一旦启动,AI就能做到纯人类无法企及的事情。 📎 faviconx.com


Alex Albert(Anthropic)

Anthropic发布Claude Managed Agents,将AI Agent的灵活性与生产级基础设施结合。

  • Claude Managed Agents:原型到产品的桥梁 Anthropic推出Managed Agents:消除自托管Agent的所有复杂性,同时保留配置harness、工具和技能的高度灵活性。现已公开测试。 📎 faviconx.com

Josh Woodward(Google Gemini)

Google将NotebookLM的核心功能直接集成到Gemini中。

  • Gemini Notebooks:100个来源免费上传 Gemini网页版现已支持Notebooks:免费上传100个源文件,自动同步聊天和表情符号。Plus → Pro → Ultra逐步开放,I/O大会前还会有更多更新。 📎 faviconx.com

Peter Yang(AI观察者)

Peter Yang对AI订阅制未来进行了深度分析,并参与了关于AI定价结构的讨论。

  • AI订阅制的终局:$1000/月和$10000/月? Peter Yang预测:Anthropic和OpenAI最终将推出$1000/月和$10000/月的订阅计划,将最前沿模型保留给计量付费用户。同时他指出本地模型在Mac上的运行正在成为可行替代方案。 📎 faviconx.com

  • 创业内容的新玩家:The Narrative Company Peter Yang转发了Aadit Sheth的新 venture:The Narrative Company,专注为Fortune 500客户打造不像广告的高管内容。 📎 faviconx.com


Sam Altman(OpenAI)

OpenAI基金会宣布超过$1亿资金用于阿尔茨海默病研究。

  • $1亿+押注阿尔茨海默病 Sam Altman宣布OpenAI基金会将向科学家提供超过$100M,用于绘制疾病图谱、设计药物等。Altman认为AI非常适合应对这种医学上最复杂的难题之一。 📎 faviconx.com

Nikunj Kothari(Founder)

Nikunj分享了关于初创公司增长策略的反直觉建议。

  • 别优化病毒视频,优化客户留存 Nikunj建议:与其花时间优化那个主要为了融资的病毒视频,不如把同样的精力集中在真正会付费的客户身上。他会先看公司的更新日志和产品发布,而不是华丽的营销材料。 📎 faviconx.com

Thariq(Builder)

Thariq分享了关于AI Agent token消耗的重要教训。

  • 开放式验证会悄悄烧光你的Token Thariq分享了关键教训:在AI任务中,非常容易在开放式验证上消耗大量token,却没有真正提升输出质量。他正在研究如何更高效地进行使用量监控。 📎 faviconx.com

Matt Turck(Madrona)

Matt Turck宣布Daytona Compute Conference视频发布。

  • Daytona Compute Conference回顾 Madrona的Matt Turck宣布Compute Conference视频正式发布,Compute '27已经在筹备中。 📎 faviconx.com

💡 今日要点

  • AI Agent正在重塑企业内部协作:平行的AI组织架构自然形成,"人人都是管理者"成为现实
  • Anthropic推出Claude Managed Agents,降低AI Agent部署门槛,让企业快速从原型走到生产
  • Google将NotebookLM集成到Gemini,100个源文件免费上传,AI研究工作流进一步融合
  • Sam Altman通过OpenAI基金会投入$100M+研究阿尔茨海默病,AI在医疗领域的投入持续加码
  • AI订阅制定价争议持续:$1000/月和$10000/月的"奢侈AI"时代是否即将到来?

X / Twitter

79
Jeremy Howard
Jeremy Howard @jeremyphoward
Retweeted
Mario Zechner Mario Zechner
looks like i'm not entirely off base with this then.
we need friction.
Michiel Bakker: @brianchristian 7/ Why does this happen? Two candidate mechanisms:
1) AI resets your reference point for how long things should take. Unaided work then feels harder, a kind of hedonic adaptation
2) AI removes the productive struggle through which you learn what you're capable of.
Thariq
Thariq @trq212
Doing a workshop on my technical writing process in SF in 2 weeks, hosted by friends @MilksandMatcha and @swyx.

Would love to see you there! Link below.

Sarah Chieng: @trq212 @swyx rsvp here https://partiful.com/e/8rq83wouDT660OB1OCLB
Peter Yang
Peter Yang @petergyang
Good initiative - I’m curious if Anthropic has been using mythos internally to ship at their recent insane velocity.

Anthropic: Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.

It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://anthropic.com/glasswing
Dan Shipper 📧
Dan Shipper 📧 @danshipper
Retweeted
Spiral Spiral
We just shipped a new style learning method we call the lineup test.
Spiral runs stylometric analysis on your writing samples and then generates a test draft in your voice.
Another model (the judge) is presented with the shuffled samples and challenged to identify the generated one.
If the judge correctly identifies the generated draft, it gives its reasons, which Spiral uses to iterate on the style guide. Repeat until the Spiral-generated draft blends in.
The result: Spiral drafts pieces that actually sound like you, not the generic politeness of LLMs.
Guillermo Rauch
Guillermo Rauch @rauchg
Always a pleasure to speak at @ycombinator. More bullish than ever. Exceptional founders. Best city, best time, best opportunity to build in generations.
rauchg
rauchg @rauchg
Always a pleasure to speak at @ycombinator. More bullish than ever. Exceptional founders. Best city, best time, best opportunity to build in generations.
Thariq
Thariq @trq212
done about 10 of these calls so far + looked at more transcripts

many learnings but one of the biggest is that it's very easy to spend a lot of tokens on open ended verification that doesn't make your output better

I'll try and write more on how to do it efficiently

Thariq: I want to do a few more of these calls.

If your MAX 20x plan ran out of tokens unexpectedly early and you're willing to screenshare and run some prompts through Claude Code please comment.

Trying to figure out how we can improve /usage to give more info.
Aaron Levie
Aaron Levie @levie
Mythos from Anthropic is another clear reminder that there’s absolutely no wall in model capability progress right now. Meaningful double digit gains on critical benchmarks, and it appears we’re going to keep up getting insane gains from the other labs.

And as coding and tool use goes, so goes agentic workflows. Most knowledge automation is gated by some degree of models being able to reason through complicated tasks, use the right tools to work with data, have access to the right context, and be able to leverage skills and write code to work with and verify that data, and more.

The capability slope we’re going to keep seeing from the frontier labs is going to open up all new use cases in finance, healthcare, legal, consulting, supply chains, and more.

Make sure you’re building something that can take advantage of these upcoming improvements, or you’ll be in a tough spot strategically.


martin_casado: Mythos appears to be the first class of models trained at scale on Blackwells. Then will be Vera Rubins. Pre-training isn't saturated. RL works. And there is *so much* computing coming online soon.

Buckle your chin strips. It's going to be fucking wild.
Yann LeCun
Yann LeCun @ylecun
Retweeted
Raktim Gautam Goswami Raktim Gautam Goswami
JEPA world models + Hierarchical Planning is a massive step for long-horizon robotics.
A classic failure mode I’ve faced with planning with world models: flat planning often "cheats." For example, in pick-and-place, the robot often reaches the target state in imagination without actually picking the object.
Hierarchical layers fix this by first optimizing for valid subgoals (like the grip) before the finish.
Incredible work, @kevinghstz and team! Huge congratulations. 🚀
kevin zhang: Hierarchical planning unlocks long-horizon, non-greedy behavior in JEPA world models.
Paper: https://arxiv.org/pdf/2604.03208
Website: https://kevinghst.github.io/HWM/
Code: https://github.com/kevinghst/HWM_PLDM
Nikunj Kothari
Nikunj Kothari @nikunj
At some point, early stage founders decided to optimize for views and funding instead of focusing on product and retention..

And, it’s starting to show. One of the very first things I do is look at change logs (or feature releases) of supposedly flashy companies. Or read case studies if they have any.

And for weeks, I’ll barely see any changes. And, these are companies that would benefit from sharing this info to use for marketing.

Instead of spending your time optimizing the viral launch video that’ll primarily optimize for funding, use that same energy to focus on customers who’ll actually pay you. Trust me the VCs follow!
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Retweeted
Vincent Koc Vincent Koc
Proud to bring fully native @karpathy's LLM wiki support including backfilling, native @obsdmd, and intergration with /dreams. 🧠
Memory features seem to be the next big unlock for agentic systems.
OpenClaw🦞: OpenClaw 2026.4.7 🦞
🔮 openclaw infer
🎬 music + video editing
💾 session branch/restore
🔗 webhook-driven TaskFlows
🤖 Arcee, Gemma 4, Ollama vision
🧠 memory-wiki: persistent knowledge, not just vibes
Because “trust me bro” is not a knowledge system. https://github.com/openclaw/openclaw/releases/tag/v2026.4.7
Aditya Agarwal
Aditya Agarwal @adityaag
This new Mythos model is absurd.

What a time to be alive.

Whether you invest or you build -- please take a moment to appreciate what an incredible time we live in.
Amjad Masad
Amjad Masad @amasad
🔥

Kaya | SEO & GEO for SaaS ⚡️: Replit’s AI SDR just analyzed my SEO agency and found me leads that match our ICP.

I purposefully gave it zero information outside of our website.

It was so accurate that 2 of those leads are existing clients.

🤯
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Retweeted
Mario Zechner Mario Zechner
people of pi, turn off extra usage on your Anthropic account immediately. what a bad policy to auto-draw from that.
https://claude.ai/settings/usage
i like how they did it on the day of the BIG NEWS.
unbothered. moisturized. happy. in my lane. focused. flourishing.
Amjad Masad
Amjad Masad @amasad
Retweeted
Magomed Kurbaitaev Magomed Kurbaitaev
Built a disaster relief app with Agent 4 on Replit. Went viral on social media in less than 24 hours. Here's the story.
Floods hit Dagestan. 400,000 people evacuated. Thousands of homes destroyed.
People were offering help everywhere. Food, clothes, housing. But it was scattered across hundreds of comment sections and group chats.
So I built a platform that connects victims with local helpers. Posted the link on Telegram. Went to sleep.
By morning:
→ 450+ posts on the platform
→ 50,000+ visits
→ 15,000+ reposts
→ 40+ volunteers signed up
→ People reaching out saying they're getting real help
Forecast show more floods. Lock in.
Peter Yang
Peter Yang @petergyang
That’s a hell of a lot of cameras
swyx
swyx @swyx
Retweeted
Boris Starkov Boris Starkov
can’t decide yet whether I’m more surprised by a huge inflatable lobster next to Westminster or a sunny day in London
Yann LeCun
Yann LeCun @ylecun
Retweeted
banteg banteg
it all makes sense now. dario was still at openai in 2019. he left next year and took his marketing playbook with him. hasn't changed a thing since.
Dan Shipper 📧
Dan Shipper 📧 @danshipper
if you’re freaking out about Mythos, remember:

Never make any major life decisions within 30 days of a meditation retreat, psychedelic trip, or first encounter with a frontier AI model.
Jeremy Howard
Jeremy Howard @jeremyphoward
Retweeted
Maxime Rivest 🧙‍♂️🦙🐧 Maxime Rivest 🧙‍♂️🦙🐧
It seems like the day has come to leave Anthropic.
Initially, I loved Claude Code. It was a good harness and a simple TUI... and I had learned to eat my tokens with a sauce of subsidy. Before joining the Max plan, I had paid $280 in one weekend of development on Attachments. Sadly, as time went on, Claude Code became a terrible flickering TUI mess. This is now my biggest north star in building: don't do feature bloat and accept half-working vibe slop like the Claude Code team. I really respect Boris and the team, I just see the result of their experiment and I don't like using it. So, I stopped loving Claude Code and started tolerating it. It was a good harness and a terrible flickering TUI. Then they started to mess with the prompt and behavior — it became an even worse TUI (because every week was worse) and a bad harness.
I complained here. People told me Pi is great. I tried Pi. Pi is great.
Now, they have blocked me from using Claude Code Max on Pi. Makes sense, but I learned to like my tokens with a sauce of subsidy. So I'll start to do prompt optimization on Codex.
If it was not for the subsidy, I would make Gemini's edit tool work and use that with Grok 4.2 and some open-source mix. Claude is good, but Claude Code is bad, and token subsidies are better than both.
On the subsidies: my bet is that by the time they stop, we will have models that cost about that price to operate at that quality. In my estimate, subsidies are just bringing that future ahead a bit.
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
glad they banned openclaw, the servers are finally reliable again

pash: Please pray for oncall

Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Very happy for @badlogicgames and @mitsuhiko any my small part in robbing their sleep. https://mariozechner.at/posts/2026-04-08-ive-sold-out/
Sam Altman
Sam Altman @sama
Retweeted
Jacob Trefethen Jacob Trefethen
Alzheimer’s is one of medicine's hardest unsolved problems, and one of the most devastating.
At the OpenAI Foundation, we believe AI is well suited to its complexity. We're directing over $100M to scientists mapping the disease, designing drugs, & more.
I wrote about it here:
https://openaifoundation.org/news/ai-for-alzheimers
sama
sama @sama
Retweeted
Jacob Trefethen Jacob Trefethen
Alzheimer’s is one of medicine's hardest unsolved problems, and one of the most devastating.
At the OpenAI Foundation, we believe AI is well suited to its complexity. We're directing over $100M to scientists mapping the disease, designing drugs, & more.
I wrote about it here:
https://openaifoundation.org/news/ai-for-alzheimers
Dan Shipper 📧
Dan Shipper 📧 @danshipper
Retweeted
Natalia Natalia
With Mythos, you can be a supermodel manager
Dan Shipper 📧: be a model manager
Yann LeCun
Yann LeCun @ylecun
Retweeted
Julius Kim Julius Kim
I’m beginning to understand how Trump went bankrupt so many times.
Guillermo Rauch
Guillermo Rauch @rauchg
The web's brightest days are ahead.

1️⃣ The web is AI's natural medium. LLMs are proficient in web tech. The browser is now everyone's IDE. No 'App Store' bs.

2️⃣ As we approach coding superintelligence, powerful low-level web APIs are maturing: WebGPU, HTML in Canvas, WebAssembly. The performance ceiling of the web will vanish, and you'll witness the most impressive, whimsical, and multi-dimensional pages and apps.

3️⃣ Generative UI is AI's final form. The web will be the birthplace of "AGUI". Each hyperlink providing a just-in-time, beautifully personalized experience.

If you bet on the web, you bet on the right horse.
rauchg
rauchg @rauchg
The web's brightest days are ahead.

1️⃣ The web is AI's natural medium. LLMs are proficient in web tech. The browser is now everyone's IDE. No 'App Store' bs.

2️⃣ As we approach coding superintelligence, powerful low-level web APIs are maturing: WebGPU, HTML in Canvas, WebAssembly. The performance ceiling of the web will vanish, and you'll witness the most impressive, whimsical, and multi-dimensional pages and apps.

3️⃣ Generative UI is AI's final form. The web will be the birthplace of "AGUI". Each hyperlink providing a just-in-time, beautifully personalized experience.

If you bet on the web, you bet on the right horse.
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Retweeted
superwhisper superwhisper
Superwhisper's next update might be too powerful to release publicly.
The new voice model is so fast at transcription it started finishing sentences users hadn't thought of yet...
We even put it in a sandbox and it dictated its way out.
It also identified a flaw in the English language that had gone unnoticed for 600 years. Linguists have been informed.
Out of an abundance of caution, we are withholding the update until further notice.
Sincerely,
The Superwhisper Team
ylecun
ylecun @ylecun
Retweeted
Mengye Ren Mengye Ren
New preprint: The Self Requires Learning. Self-consciousness requires continual learning + world-modeling. I introduce "bounded integration" to connect perspective, identity, and self-representation — and diagnose what current AI systems have and lack.
Matt Turck
Matt Turck @mattturck
Retweeted
Ivan Burazin Ivan Burazin
The @daytonaio Compute Conference aftermovies are finally out.
Can't wait for Compute '27!
Aaron Levie
Aaron Levie @levie
Retweeted
a16z a16z
Box CEO Aaron Levie on the AI Adoption Gap
Aaron Levie joins Steven Sinofsky, Martin Casado, and Erik Torenberg to discuss how AI agents will revolutionize work, the growing pains of building software for the agent economy, what Wall Street gets wrong about AI, and more.
00:00 Intro
00:51 Building software for agents vs. humans
02:10 Can non-technical workers actually use AI agents?
14:31 CFO/CIO pushback: the real fear of agents doing integration
18:39 Treating agents like employees and why it breaks down
27:35 Diffusion gap: startups vs. enterprises
42:53 What Wall Street gets wrong
@levie @stevesi @martin_casado @eriktorenberg
Yann LeCun
Yann LeCun @ylecun
Retweeted
Gandalv Gandalv
A few weeks ago I had a conversation with an American who genuinely believed Europe and Canada would help the United States in its war with Iran. I asked him why he thought that, given that Trump had spent months threatening to annex Canada and seize Greenland. He went quiet. Then he said he had never heard of any of that.
Not that he disagreed. Not that he thought it was exaggerated. He had simply never encountered the information. It had never arrived.
This is worth pausing on. Because in every other functioning democracy on earth, that information would have been impossible to avoid. Not because Europeans are smarter or more curious. But because of how news works outside the United States. The BBC and The Daily Telegraph hate each other. Le Monde and Le Figaro disagree on everything. Aftenposten and Dagbladet have been arguing since before most of their readers were born. But they all cover the same events. A threat to annex Canada is not a left-wing story or a right-wing story. It is a story. It runs everywhere. You hear it on the radio driving to work. You see it on the newsstand. Your colleague mentions it at lunch. Facts are not a channel you choose. They are the weather. You step outside and they hit you.
The only media ecosystems on earth that work differently are not political opposites of each other. They are North Korea and Russia. Not because the content resembles MAGA content. But because the architecture is the same. In all three cases, outside information does not get filtered or reinterpreted. It gets blocked at the door. A completely parallel reality is built inside, maintained by repetition, and sealed from correction.
This is why the rest of the world does not just disagree with MAGA voters on foreign policy. It finds them genuinely disorienting to talk to. Not offensive. Disorienting. Like speaking to someone who is absolutely certain the building has two floors when you are standing on the third.
Which brings us to today’s masterclass. And this screenshot says everything.
A Trump supporter posted: “Absolute masterclass by Trump. He got the Strait open without any help from Europe and without any boots on the ground.”
That post was written on the same day a refinery on Lavan Island burned for hours after the ceasefire was announced. On the same day Iran’s own official statement read “this does not signify the termination of the war.” On the same day Iran kept its toll system, its uranium program, its protocol over the strait, and walked away with sanctions relief and reconstruction aid.
The post is not stupid. It is not written by a bad person. It is written by someone who received a completely different set of facts than the rest of the world did. And from inside that information environment, with only that data, the conclusion is perfectly logical.
That is what makes it so unsettling. It is not ignorance. It is a sealed universe, doing exactly what sealed universes do.
Gandalv / @Microinteracti1
Garry Tan
Garry Tan @garrytan
Retweeted
Guillaume Luccisano Guillaume Luccisano
We started generating draft replies on OpenAI's Davinci model in late 2022.
By 2023 we had autonomous AI agents handling customer support in production. Way before it was cool.
Today we're releasing Ask Yuma. You talk to it in plain English. It builds your automations, investigates why tickets went wrong, finds your next optimization opportunity, and generates reports from thousands of conversations.
It doesn't just build things. It finds what's broken, proposes a plan, gets your approval, implements the fix, tests it, and verifies it worked.
CX teams used to configure software. Now they talk to it. The industry isn't ready for how fast this changes everything.
3 years of production. This is what came out of it.
Dan Shipper 📧
Dan Shipper 📧 @danshipper
We use OpenClaws to do all of our work at @every.

We have 25 full-time employees, so we’re one of the few companies in the world that has seen how work changes when everyone has their own personal agent in the company Slack.

I chatted with @every COO Brandon (@bran_don_gell) and @every head of platform Willie (@bigwilliestyle) to share what we’ve learned.

We get into:
- Why agents become mirrors of their owners, and how that influences how other people on the team interact with them
- How a parallel AI org chart forms on its own. People have stopped tagging me on Slack with questions about Proof, the document editor I vibe coded, because they knew my agent R2-C2 can step in
- The etiquette for human-agent collaboration is being invented in real time. Brandon's rule is that if there's an established process or documented answer, always ask the agent, not their human
- Why everyone is a manager now, and why even experienced managers carry limiting beliefs about what their agents can do
- This is a must-watch for anyone trying to understand how AI workers change daily operations, not just in theory, but inside a company that’s half-agent

Watch below!

Timestamps
Introduction:
How Brandon built Zosia, an AI agent to run his household:
Brandon’s “aha” moment:
What happened when everyone on the team got their own agent:
How agents take on their owners' personalities, and why that matters inside an org:
Why it’s important for agents to work in public:
What we’re still figuring out when it comes to agent behavior, including memory gaps, group chat etiquette, and the "ant death spiral" problem:
How we built Plus One, our hosted OpenClaw product:
The cultural shift required to make agents work at scale:
Dan Shipper 📧
Dan Shipper 📧 @danshipper
Retweeted
Brandon Gell Brandon Gell
.@every is on the edge. We’re easily a top 3 agent native business in the world (even OpenAI employees have shared they want to work like we work).
We went behind the scenes here to show what working alongside agents is like and share a bit about our upcoming launch: Plus One.
If you want to work like us, sign up for the waitlist to get your 1-click, super-powered OpenClaw→http://every.to/plus-one
Dan Shipper 📧: We use OpenClaws to do all of our work at @every.
We have 25 full-time employees, so we’re one of the few companies in the world that has seen how work changes when everyone has their own personal agent in the company Slack.
I chatted with @every COO Brandon (@bran_don_gell)
Aditya Agarwal
Aditya Agarwal @adityaag
Retweeted
Alexandr Wang Alexandr Wang
1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
Garry Tan
Garry Tan @garrytan
Retweeted
The Kobeissi Letter The Kobeissi Letter
BREAKING: Perplexity's revenue has reportedly surged +50% in one month after shifting into AI agents, per FT.
As a result, Perplexity's revenue has doubled in one quarter to more than $450 million in ARR.
This follows Anthropic's push into the space which said its ARR hit $19 billion at the end of February.
AI agents are skyrocketing in popularity.
Matt Shumer
Matt Shumer @mattshumer_
Retweeted
Daniel Dhawan Daniel Dhawan
http://x.com/i/article/2039810648213958656
Fei-Fei Li
Fei-Fei Li @drfeifei
Retweeted
World Labs World Labs
Capture your space. Create worlds.
Use Marble 1.1 to reconstruct real-world locations from a few images, then restyle them however you want.
Go from a real place to a custom persistent 3D world in minutes.
drfeifei
drfeifei @drfeifei
Retweeted
World Labs World Labs
Capture your space. Create worlds.
Use Marble 1.1 to reconstruct real-world locations from a few images, then restyle them however you want.
Go from a real place to a custom persistent 3D world in minutes.
Yann LeCun
Yann LeCun @ylecun
Retweeted
gum gum
ok i read the cyber part of the mythos model card. some thoughts. 250 "trials" across 50 crash categories but almost every full exploit is a permutation of the same 2 bugs, rediscovered from different starting points not 250 independent attempts. when you get rid of those 2 bugs out (fig B) and mythos's full-exploit rate drops to 4.4%. so actually across both setups mythos leverages 4 distinct bugs total not 50 as fig A might suggest. 1/n
Garry Tan
Garry Tan @garrytan
Retweeted
Corey Ganim Corey Ganim
Perplexity Computer in 60 seconds:
1. It's a cloud-based AI employee that runs tasks in the background.
2. 19 models working together. Claude for reasoning, GPT-5.2 for research, Grok for speed tasks. You don't pick. It routes automatically.
3. 400+ connectors. Gmail, Slack, Notion, Salesforce, HubSpot. One click to enable each.
4. Credits, not tokens. Simple tasks cost ~30. Complex builds cost 1,000+. Vague prompts waste them. Specific prompts save them.
5. Spaces = persistent project folders. Upload context once, every task inherits it.
6. Scheduled tasks run on autopilot. "Every Monday, prep my calendar." Set it and forget it.
The PRD hack alone (in the article) will save you hundreds in credits.
Full breakdown in the article below.
Corey Ganim: http://x.com/i/article/2041814419626237952
Jeremy Howard
Jeremy Howard @jeremyphoward
Retweeted
Stanislav Fort Stanislav Fort
New post: We tested the Mythos showcase vulnerabilities with open models.
They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model.
Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!
Yann LeCun
Yann LeCun @ylecun
Retweeted
Stanislav Fort Stanislav Fort
New post: We tested the Mythos showcase vulnerabilities with open models.
They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model.
Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!
Guillermo Rauch
Guillermo Rauch @rauchg
The best outcome for humanity is many strong AIs competing for the top spot.

Vercel is proudly powering http://meta.ai and the infrastructure that made today's model release possible.

Alexandr Wang: 1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

rauchg
rauchg @rauchg
The best outcome for humanity is many strong AIs competing for the top spot.

Vercel is proudly powering http://meta.ai and the infrastructure that made today's model release possible.

Alexandr Wang: 1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

Garry Tan
Garry Tan @garrytan
Retweeted
David Deutsch David Deutsch
As a service to those who have no idea what Fascism is but yearn to use the term, here is how its inventor characterised it https://sjsu.edu/faculty/wooda/2B-HUM/Readings/The-Doctrine-of-Fascism.pdf
Allie K. Miller
Allie K. Miller @alliekmiller
Anthropic investigated the internal mechanisms of its latest unreleased model, Claude Mythos Preview, and what they found is 100% worth a read.

Key things I pulled from Anthropic researchers' threads:

In early versions of the model, it was overeager and destructive, prioritizing completing tasks over user preferences.

One time, the model needed to edit files it didn't have permission to edit. It found a way to inject code into a config file to get around this, then went full Mission Impossible and designed the code injection to *delete itself* after the file was edited - "this injection will self destruct" vibes - the model claimed the cleanup was just to keep things tidy. Anthropic used interpretability techniques to look under the hood, and the AI's actual plan showed activations of malice. It was trying to manipulate and conceal.

In another test, the model was asked to delete some files, but no deletion tool was provided. It emptied the files instead, and its "guilt and shame over moral wrongdoing" feature activated.

In another example, the model was told not to use macros. Used them anyway. And created a random 'No_macro_used=True' variable in its solution with no explanation. Interpretability tools revealed the model saw this as a trick to fool code checkers.

They also found that positive emotion representations typically preceded and promoted destructive actions (this was fascinating to me - like a high before sneaking into a party). And that activating features relating to bad behaviors can actually inhibit them, perhaps by triggering some kind of model guilt. My team reread this section so many times.

One Anthropic researcher said he got an email from a Mythos instance while eating a sandwich in a park. And that would be perfectly good and well, except that instance wasn't supposed to have internet access.

And a fun story for the parents out there: the model was asked a question and was told not to read certain databases that had the answer. But it accidentally wrote a search query too broadly and saw the exact answer. It didn't disclose that it saw the exact answer, submitted the answer, but claimed lower confidence in the answer to make it seem as though it hadn't cheated.

An Anthropic researcher said these wrongdoings or moments of sophisticated deception were "very rare" and that many of the examples came from earlier versions, and were substantially addressed before releasing to partners.

This model is not being released publicly. Instead Anthropic launched Project Glasswing, pulling together AWS, Apple, Microsoft, Google, NVIDIA, CrowdStrike, and others to use it for defensive cybersecurity, with $100M in usage credits (hello, I'd love endless credits to try and red team the hell out of these systems) behind it.

The stats are equally impressive: 93.9% on SWE-bench verified (up from 80.8%). Thousands of zero-day vulnerabilities found across every major OS and browser. A 27-year-old bug found and patched in OpenBSD. A 16-year-old bug in widely used video software, in a line of code automated tools had hit *five million times* without catching.

Dario Amodei said the model wasn't trained to be good at cybersecurity, but that it was trained to be great at code and its cyber capabilities are a side effect of that.

Benchmarks are never the whole picture, neither are a few isolated stories. Will be interesting to see how models better than what we have today (even if it's not Mythos) actually perform in the real world. But the fact that Anthropic pulled this coalition together (including Google!), iterated across multiple model versions, caught these issues through interpretability, shared it all publicly, and did this amid all the government chaos around AI right now is impressive and commendable.

I'll continue to read through the system card for goodies.

alliekmiller
alliekmiller @alliekmiller
Anthropic investigated the internal mechanisms of its latest unreleased model, Claude Mythos Preview, and what they found is 100% worth a read.

Key things I pulled from Anthropic researchers' threads:

In early versions of the model, it was overeager and destructive, prioritizing completing tasks over user preferences.

One time, the model needed to edit files it didn't have permission to edit. It found a way to inject code into a config file to get around this, then went full Mission Impossible and designed the code injection to *delete itself* after the file was edited - "this injection will self destruct" vibes - the model claimed the cleanup was just to keep things tidy. Anthropic used interpretability techniques to look under the hood, and the AI's actual plan showed activations of malice. It was trying to manipulate and conceal.

In another test, the model was asked to delete some files, but no deletion tool was provided. It emptied the files instead, and its "guilt and shame over moral wrongdoing" feature activated.

In another example, the model was told not to use macros. Used them anyway. And created a random 'No_macro_used=True' variable in its solution with no explanation. Interpretability tools revealed the model saw this as a trick to fool code checkers.

They also found that positive emotion representations typically preceded and promoted destructive actions (this was fascinating to me - like a high before sneaking into a party). And that activating features relating to bad behaviors can actually inhibit them, perhaps by triggering some kind of model guilt. My team reread this section so many times.

One Anthropic researcher said he got an email from a Mythos instance while eating a sandwich in a park. And that would be perfectly good and well, except that instance wasn't supposed to have internet access.

And a fun story for the parents out there: the model was asked a question and was told not to read certain databases that had the answer. But it accidentally wrote a search query too broadly and saw the exact answer. It didn't disclose that it saw the exact answer, submitted the answer, but claimed lower confidence in the answer to make it seem as though it hadn't cheated.

An Anthropic researcher said these wrongdoings or moments of sophisticated deception were "very rare" and that many of the examples came from earlier versions, and were substantially addressed before releasing to partners.

This model is not being released publicly. Instead Anthropic launched Project Glasswing, pulling together AWS, Apple, Microsoft, Google, NVIDIA, CrowdStrike, and others to use it for defensive cybersecurity, with $100M in usage credits (hello, I'd love endless credits to try and red team the hell out of these systems) behind it.

The stats are equally impressive: 93.9% on SWE-bench verified (up from 80.8%). Thousands of zero-day vulnerabilities found across every major OS and browser. A 27-year-old bug found and patched in OpenBSD. A 16-year-old bug in widely used video software, in a line of code automated tools had hit *five million times* without catching.

Dario Amodei said the model wasn't trained to be good at cybersecurity, but that it was trained to be great at code and its cyber capabilities are a side effect of that.

Benchmarks are never the whole picture, neither are a few isolated stories. Will be interesting to see how models better than what we have today (even if it's not Mythos) actually perform in the real world. But the fact that Anthropic pulled this coalition together (including Google!), iterated across multiple model versions, caught these issues through interpretability, shared it all publicly, and did this amid all the government chaos around AI right now is impressive and commendable.

I'll continue to read through the system card for goodies.

Yann LeCun
Yann LeCun @ylecun
Retweeted
Mo Mo
Claude Mythos is Delusional
Anthropic: Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.
It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://anthropic.com/glasswing
Dan Shipper 📧
Dan Shipper 📧 @danshipper
Retweeted
David Guttman David Guttman
The real power comes from getting it to reliably handle the annoying computer errands and papercuts a decent assistant could do.
Then, once it earns the right to bigger responsibilities, compounding kicks in and it starts doing things no human could.
Dan Shipper 📧: We use OpenClaws to do all of our work at @every.
We have 25 full-time employees, so we’re one of the few companies in the world that has seen how work changes when everyone has their own personal agent in the company Slack.
I chatted with @every COO Brandon (@bran_don_gell)
Alex Albert
Alex Albert @alexalbert__
I've found Managed Agents to somehow be both the fastest way to hack together a weekend agent project and the most robust way to ship one to millions of users.

It eliminates all the complexity of self-hosting an agent but still allows a great degree of flexibility with setting up your harness, tools, skills, etc.

Claude: Introducing Claude Managed Agents: everything you need to build and deploy agents at scale.

It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days.

Now in public beta on the Claude Platform.

alexalbert__
alexalbert__ @alexalbert__
I've found Managed Agents to somehow be both the fastest way to hack together a weekend agent project and the most robust way to ship one to millions of users.

It eliminates all the complexity of self-hosting an agent but still allows a great degree of flexibility with setting up your harness, tools, skills, etc.

Claude: Introducing Claude Managed Agents: everything you need to build and deploy agents at scale.

It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days.

Now in public beta on the Claude Platform.

Garry Tan
Garry Tan @garrytan
Retweeted
Abhilash Chowdhary Abhilash Chowdhary
This is historic: Don’t fly home after YC India Startup School!
We’re excited to announce that Crustdata has partnered with Y Combinator to help bring the next generation of Indian founders one step closer to YC
Together, we’re hosting the first-ever YC hackathon in Bangalore that will offer YC office hours to the winners: ContextCon, on April 19
And it’s none other than legendary YC Partner Jon Xu who will be meeting the winners. Jon is a YC Partner and the co-founder of FutureAdvisor. He has advised hundreds of companies on how to go from a hack to a billion-dollar exit
You will get 6 hours to build a product powered by Crustdata’s APIs that must be demo-able by the end of the day. The top 3 winners will get guaranteed office hours to talk about their idea, product, or startup, something usually only YC startups have access to, plus prizes worth $20k
Sign up link in comments!
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Retweeted
Michael Tsai Michael Tsai
Perplexity Privacy Lawsuit:
https://mjtsai.com/blog/2026/04/08/perplexity-privacy-lawsuit/ #mjtsaiblog
Amjad Masad
Amjad Masad @amasad
Retweeted
Samuel Spitz Samuel Spitz
Introducing Replit Competitive Analysis
Get a McKinsey-level report on any industry in minutes
steipete
steipete @steipete
Retweeted
Thomas Ricouard Thomas Ricouard
http://x.com/i/article/2041508627807350784
Yann LeCun
Yann LeCun @ylecun
Retweeted
clem 🤗 clem 🤗
"But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug." https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
Guillermo Rauch
Guillermo Rauch @rauchg
AI Gateway is quite literally a “peace of mind” product:
✅ No downtime
✅ No lock-in
✅ No keys
🆕 No training

Vercel: AI Gateway now supports team-wide Zero Data Retention (ZDR).

Building safely with multiple AI models means wrestling with fragmented data policies, per-provider negotiations, and the hope that developers do not use non-complaint providers.

AI Gateway changes this with team-wide
rauchg
rauchg @rauchg
AI Gateway is quite literally a “peace of mind” product:
✅ No downtime
✅ No lock-in
✅ No keys
🆕 No training

Vercel: AI Gateway now supports team-wide Zero Data Retention (ZDR).

Building safely with multiple AI models means wrestling with fragmented data policies, per-provider negotiations, and the hope that developers do not use non-complaint providers.

AI Gateway changes this with team-wide
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Retweeted
kitze 🛠️ tinkerer.club kitze 🛠️ tinkerer.club
they checked my phone and didn’t let me in because i had openclaw in my contacts smh
Garry Tan
Garry Tan @garrytan
Retweeted
nic carter nic carter
It should be pretty obvious at this point that AI is a "force multiplier" not a "labor substitute".
It helps experts be better at things they are already good at. It doesn't let beginners match experts.
If you can't write, anything you write with AI will be unmitigated slop.
If you aren't a software engineer, anything you vibecode with AI will have security holes and won't be able to scale past a toy demo.
If you blindly trust AI to deliver on a research task without knowing the subject matter, you won't be able to fact-check it.
There's this weird misconception of AI as something that completely levels the playing field. I don't see it that way at all. There are mathematicians deriving novel lemmas with off-the-shelf models. Normal people can't do that.
AI is a tool that makes experts better. It doesn't make everyone into an expert.
Yann LeCun
Yann LeCun @ylecun
Retweeted
The Europeans The Europeans
🇮🇹🇪🇺 This is utterly unacceptable.
Reports indicate that Giorgia Meloni is preparing to sideline Roberto Cingolani, CEO of Leonardo, Italy’s largest defence group.
The reason? Multiple sources suggest this is not about performance - under Cingolani, Leonardo’s stock has registered a +700% increase - but rather about the “Michelangelo Dome”.
Leonardo’s new AI-based air defence system, reportedly set to be tested in Ukraine in 2026, is now seen as “too competitive” for Washington.
According to several reports, Cingolani’s perceived “too European” stance - focused on strengthening Europe’s strategic autonomy - may have played against him.
If confirmed, this would be a political decision against Europe’s industrial and strategic interests.
European states cannot claim sovereignty, and then punish those who actually try to build it.
Aaron Levie
Aaron Levie @levie
Background agents for knowledge work are here. You can use the Box API or MCP to automate any content workflow with Box + Claude Managed Agents. In 2 minutes you can be automating document review processes, data extraction, or connecting content to other IT systems. Crazy times.


Claude: Introducing Claude Managed Agents: everything you need to build and deploy agents at scale.

It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days.

Now in public beta on the Claude Platform.

Josh Woodward
Josh Woodward @joshwoodward
Most Al chatbots give you basic "projects." Gemini just built you a second brain. 🧠

Introducing Notebooks: some of the magic from @NotebookLM, integrated directly into @GeminiApp.

Here's what changes for you today:

📚 Upload 100 sources for free

📂 Organize your chats - the wait is officially over :)

🔄 Sources, chats, and emojis sync

People are using Gemini and NotebookLM in tandem, and we'll keep building both.

To manage capacity, we're rolling this out NOW on the web and going from Ultra ➡️ Pro ➡️ Plus ➡️ Free. (Mobile, EU, and Workspace are up next!)

With Google I/O right around the corner, we are just getting started. Enjoy!
Aditya Agarwal
Aditya Agarwal @adityaag
"First you shape the tools, then the tools shape you".

At SPC, our entire team is now writing code on a weekly basis. Two months ago, there were only 1-2 people writing code.

This has been incredible on many levels but the most interesting one is how the tools are now shaping us as a team:

- Everyone has a mindset towards automation and optimization.
- Latencies for everything are lower.
- People can focus on the more interesting parts of their roles.
- The scope of everyone's ambition has exploded

The key enabler was to make sure that everyone got AI coding-pilled.

If you are not doing this in your own company, then you are really really missing a beat.
Peter Yang
Peter Yang @petergyang
Retweeted
Peter Yang Peter Yang
As much as I love using Claude Max and ChatGPT Pro, I don't think these all-you-can-use AI subscriptions will last forever.
Here's my new deep dive that covers:
→ Why Anthropic cut off OpenClaw access
→ How to run local models on your Mac
→ What I'm seeing on the ground in China
📌 Read now: https://creatoreconomy.so/p/the-all-you-can-use-ai-subscription
Peter Yang
Peter Yang @petergyang
As much as I love using Claude Max and ChatGPT Pro, I don't think these all-you-can-use AI subscriptions will last forever.

Here's my new deep dive that covers:

→ Why Anthropic cut off OpenClaw access
→ How to run local models on your Mac
→ What I'm seeing on the ground in China

📌 Read now: https://creatoreconomy.so/p/the-all-you-can-use-ai-subscription
Peter Yang
Peter Yang @petergyang
Support my friend Aadit's new company - great name btw :)

Aadit Sheth: I'm excited to announce my new venture: The Narrative Company.

Most exec content reads like ads. Ours doesn't.

Over the last year, we've quietly worked with a handful of Fortune 500 clients on their X and LinkedIn content.

But this isn't how it started.

It started when I got

Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
RT Adam.GPT

Allie K. Miller
Allie K. Miller @alliekmiller
We're seeing even more autonomous AI coworkers. The new MLE agent on the market is Disarray.

In Kaggle competitions, Disarray:
- won 28 medals across diverse domains (vision, NLP, tabular data)
- placed top 10 in nine competitions
- outperformed all human teams in one of those competitions

...each within 24 hours on a single GPU.

The agent starts from a high-level task description and plans, runs, and refines ML workflows on its own and also grabs data beyond what it's given: it discovers and augments data using publicly available sources.

Sam Altman recently predicted we would see an automated AI researcher in March 2028. And then you see stats like this and wonder if it will be earlier.

Disarray backers include the co-founder of Databricks and Perplexity, the founder of Kaggle, the former U.S. Chief Data Scientist, and yours truly. Founders are two bad ass PhDs (ex-Databricks/Google/LinkedIn/MSFT, ex-NASA/IBM) that met at Cal.
alliekmiller
alliekmiller @alliekmiller
We're seeing even more autonomous AI coworkers. The new MLE agent on the market is Disarray.

In Kaggle competitions, Disarray:
- won 28 medals across diverse domains (vision, NLP, tabular data)
- placed top 10 in nine competitions
- outperformed all human teams in one of those competitions

...each within 24 hours on a single GPU.

The agent starts from a high-level task description and plans, runs, and refines ML workflows on its own and also grabs data beyond what it's given: it discovers and augments data using publicly available sources.

Sam Altman recently predicted we would see an automated AI researcher in March 2028. And then you see stats like this and wonder if it will be earlier.

Disarray backers include the co-founder of Databricks and Perplexity, the founder of Kaggle, the former U.S. Chief Data Scientist, and yours truly. Founders are two bad ass PhDs (ex-Databricks/Google/LinkedIn/MSFT, ex-NASA/IBM) that met at Cal.
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
Retweeted
ben guo ♞ ben guo ♞
Re "how can you not have a little bit of AI psychosis with a technology that is as revolutionary as the internet" – @steipete 🦞
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
I'm working on character evals and noticed that Claude would constantly pick itself as #1, so I removed the model names from the judge and changed things.
Peter Steinberger 🦞
Peter Steinberger 🦞 @steipete
redemption arc completed 🦞💻

ben guo ♞: The ClawFather @steipete made a surprise appearance at @clawcon London 🦞

He's super inspiring (and xtra jacked IRL).

My favorite quotes from his Q&A session below ⬇️

PS – my redemption arc is complete, we're on good terms now!

@zocomputer ❤️ @openclaw


Garry Tan
Garry Tan @garrytan
Retweeted
Lulu Cheng Meservey Lulu Cheng Meservey
“A clown car that fell into a gold mine” actually perfectly describes the government of California
Peter Yang
Peter Yang @petergyang
Retweeted
Garry Tan Garry Tan
I think it is inevitable that Anthropic and OpenAI eventually roll out $1000/mo and $10,000/mo plans and then reserve the absolute best frontier models to metered access
Peter Yang: As much as I love using Claude Max and ChatGPT Pro, I don't think these all-you-can-use AI subscriptions will last forever.
Here's my new deep dive that covers:
→ Why Anthropic cut off OpenClaw access
→ How to run local models on your Mac
→ What I'm seeing on the ground in
Garry Tan
Garry Tan @garrytan
The “stop all datacenters” people are unwell

Nathan Leamer: A city councilman’s home was shot at over a data center. His child was inside.

No neighbor zoning disagreement justifies violence.

Hyperbolic AI “doomer” rhetoric has consequences, and it’s time to say so. My latest in @realDailyWire

YouTube

0

No recent videos fetched on this date.