2026-04-08

Jeremy Howard @jeremyphoward

Retweeted

Mario Zechner

looks like i'm not entirely off base with this then.
we need friction.
Michiel Bakker: @brianchristian 7/ Why does this happen? Two candidate mechanisms:
1) AI resets your reference point for how long things should take. Unaided work then feels harder, a kind of hedonic adaptation
2) AI removes the productive struggle through which you learn what you're capable of.

Apr 7, 08:15 PM ET View post

Thariq @trq212

Doing a workshop on my technical writing process in SF in 2 weeks, hosted by friends @MilksandMatcha and @swyx.

Would love to see you there! Link below.

Sarah Chieng: @trq212 @swyx rsvp here https://partiful.com/e/8rq83wouDT660OB1OCLB

Apr 7, 08:38 PM ET View post

Peter Yang @petergyang

Good initiative - I’m curious if Anthropic has been using mythos internally to ship at their recent insane velocity.

Anthropic: Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.

It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://anthropic.com/glasswing

Apr 7, 08:46 PM ET View post

Dan Shipper 📧 @danshipper

Retweeted

Spiral

We just shipped a new style learning method we call the lineup test.
Spiral runs stylometric analysis on your writing samples and then generates a test draft in your voice.
Another model (the judge) is presented with the shuffled samples and challenged to identify the generated one.
If the judge correctly identifies the generated draft, it gives its reasons, which Spiral uses to iterate on the style guide. Repeat until the Spiral-generated draft blends in.
The result: Spiral drafts pieces that actually sound like you, not the generic politeness of LLMs.

Apr 7, 09:01 PM ET View post

Guillermo Rauch @rauchg

Always a pleasure to speak at @ycombinator. More bullish than ever. Exceptional founders. Best city, best time, best opportunity to build in generations.

Apr 7, 11:30 PM ET View post

rauchg @rauchg

Always a pleasure to speak at @ycombinator. More bullish than ever. Exceptional founders. Best city, best time, best opportunity to build in generations.

Apr 7, 11:30 PM ET View post

Thariq @trq212

done about 10 of these calls so far + looked at more transcripts

many learnings but one of the biggest is that it's very easy to spend a lot of tokens on open ended verification that doesn't make your output better

I'll try and write more on how to do it efficiently

Thariq: I want to do a few more of these calls.

If your MAX 20x plan ran out of tokens unexpectedly early and you're willing to screenshare and run some prompts through Claude Code please comment.

Trying to figure out how we can improve /usage to give more info.

Apr 7, 11:37 PM ET View post

Aaron Levie @levie

Mythos from Anthropic is another clear reminder that there’s absolutely no wall in model capability progress right now. Meaningful double digit gains on critical benchmarks, and it appears we’re going to keep up getting insane gains from the other labs.

And as coding and tool use goes, so goes agentic workflows. Most knowledge automation is gated by some degree of models being able to reason through complicated tasks, use the right tools to work with data, have access to the right context, and be able to leverage skills and write code to work with and verify that data, and more.

The capability slope we’re going to keep seeing from the frontier labs is going to open up all new use cases in finance, healthcare, legal, consulting, supply chains, and more.

Make sure you’re building something that can take advantage of these upcoming improvements, or you’ll be in a tough spot strategically.

martin_casado: Mythos appears to be the first class of models trained at scale on Blackwells. Then will be Vera Rubins. Pre-training isn't saturated. RL works. And there is *so much* computing coming online soon.

Buckle your chin strips. It's going to be fucking wild.

Apr 8, 12:19 AM ET View post

Yann LeCun @ylecun

Retweeted

Raktim Gautam Goswami

JEPA world models + Hierarchical Planning is a massive step for long-horizon robotics.
A classic failure mode I’ve faced with planning with world models: flat planning often "cheats." For example, in pick-and-place, the robot often reaches the target state in imagination without actually picking the object.
Hierarchical layers fix this by first optimizing for valid subgoals (like the grip) before the finish.
Incredible work, @kevinghstz and team! Huge congratulations. 🚀
kevin zhang: Hierarchical planning unlocks long-horizon, non-greedy behavior in JEPA world models.
Paper: https://arxiv.org/pdf/2604.03208
Website: https://kevinghst.github.io/HWM/
Code: https://github.com/kevinghst/HWM_PLDM

Apr 8, 01:28 AM ET View post

Nikunj Kothari @nikunj

At some point, early stage founders decided to optimize for views and funding instead of focusing on product and retention..

And, it’s starting to show. One of the very first things I do is look at change logs (or feature releases) of supposedly flashy companies. Or read case studies if they have any.

And for weeks, I’ll barely see any changes. And, these are companies that would benefit from sharing this info to use for marketing.

Instead of spending your time optimizing the viral launch video that’ll primarily optimize for funding, use that same energy to focus on customers who’ll actually pay you. Trust me the VCs follow!

Apr 8, 01:56 AM ET View post

Peter Steinberger 🦞 @steipete

Retweeted

Vincent Koc

Proud to bring fully native @karpathy's LLM wiki support including backfilling, native @obsdmd, and intergration with /dreams. 🧠
Memory features seem to be the next big unlock for agentic systems.
OpenClaw🦞: OpenClaw 2026.4.7 🦞
🔮 openclaw infer
🎬 music + video editing
💾 session branch/restore
🔗 webhook-driven TaskFlows
🤖 Arcee, Gemma 4, Ollama vision
🧠 memory-wiki: persistent knowledge, not just vibes
Because “trust me bro” is not a knowledge system. https://github.com/openclaw/openclaw/releases/tag/v2026.4.7

Apr 8, 02:09 AM ET View post

Aditya Agarwal @adityaag

This new Mythos model is absurd.

What a time to be alive.

Whether you invest or you build -- please take a moment to appreciate what an incredible time we live in.

Apr 8, 02:52 AM ET View post

Amjad Masad @amasad

🔥

Kaya | SEO & GEO for SaaS ⚡️: Replit’s AI SDR just analyzed my SEO agency and found me leads that match our ICP.

I purposefully gave it zero information outside of our website.

It was so accurate that 2 of those leads are existing clients.

🤯

Apr 8, 04:03 AM ET View post

Peter Steinberger 🦞 @steipete

Retweeted

Mario Zechner

people of pi, turn off extra usage on your Anthropic account immediately. what a bad policy to auto-draw from that.
https://claude.ai/settings/usage
i like how they did it on the day of the BIG NEWS.
unbothered. moisturized. happy. in my lane. focused. flourishing.

Apr 8, 04:48 AM ET View post

Amjad Masad @amasad

Retweeted

Magomed Kurbaitaev

Built a disaster relief app with Agent 4 on Replit. Went viral on social media in less than 24 hours. Here's the story.
Floods hit Dagestan. 400,000 people evacuated. Thousands of homes destroyed.
People were offering help everywhere. Food, clothes, housing. But it was scattered across hundreds of comment sections and group chats.
So I built a platform that connects victims with local helpers. Posted the link on Telegram. Went to sleep.
By morning:
→ 450+ posts on the platform
→ 50,000+ visits
→ 15,000+ reposts
→ 40+ volunteers signed up
→ People reaching out saying they're getting real help
Forecast show more floods. Lock in.

Apr 8, 05:07 AM ET View post

Peter Yang @petergyang

That’s a hell of a lot of cameras

Apr 8, 05:32 AM ET View post

swyx @swyx

Retweeted

Boris Starkov

can’t decide yet whether I’m more surprised by a huge inflatable lobster next to Westminster or a sunny day in London

Apr 8, 05:32 AM ET View post

Yann LeCun @ylecun

Retweeted

banteg

it all makes sense now. dario was still at openai in 2019. he left next year and took his marketing playbook with him. hasn't changed a thing since.

Apr 8, 06:16 AM ET View post

Dan Shipper 📧 @danshipper

if you’re freaking out about Mythos, remember:

Never make any major life decisions within 30 days of a meditation retreat, psychedelic trip, or first encounter with a frontier AI model.

Apr 8, 07:09 AM ET View post

Jeremy Howard @jeremyphoward

Retweeted

Maxime Rivest 🧙‍♂️🦙🐧

It seems like the day has come to leave Anthropic.
Initially, I loved Claude Code. It was a good harness and a simple TUI... and I had learned to eat my tokens with a sauce of subsidy. Before joining the Max plan, I had paid $280 in one weekend of development on Attachments. Sadly, as time went on, Claude Code became a terrible flickering TUI mess. This is now my biggest north star in building: don't do feature bloat and accept half-working vibe slop like the Claude Code team. I really respect Boris and the team, I just see the result of their experiment and I don't like using it. So, I stopped loving Claude Code and started tolerating it. It was a good harness and a terrible flickering TUI. Then they started to mess with the prompt and behavior — it became an even worse TUI (because every week was worse) and a bad harness.
I complained here. People told me Pi is great. I tried Pi. Pi is great.
Now, they have blocked me from using Claude Code Max on Pi. Makes sense, but I learned to like my tokens with a sauce of subsidy. So I'll start to do prompt optimization on Codex.
If it was not for the subsidy, I would make Gemini's edit tool work and use that with Grok 4.2 and some open-source mix. Claude is good, but Claude Code is bad, and token subsidies are better than both.
On the subsidies: my bet is that by the time they stop, we will have models that cost about that price to operate at that quality. In my estimate, subsidies are just bringing that future ahead a bit.

Apr 8, 07:48 AM ET View post

Peter Steinberger 🦞 @steipete

glad they banned openclaw, the servers are finally reliable again

pash: Please pray for oncall

Apr 8, 08:03 AM ET View post

Peter Steinberger 🦞 @steipete

Very happy for @badlogicgames and @mitsuhiko any my small part in robbing their sleep. https://mariozechner.at/posts/2026-04-08-ive-sold-out/

Apr 8, 09:07 AM ET View post

Sam Altman @sama

Retweeted

Jacob Trefethen

Alzheimer’s is one of medicine's hardest unsolved problems, and one of the most devastating.
At the OpenAI Foundation, we believe AI is well suited to its complexity. We're directing over $100M to scientists mapping the disease, designing drugs, & more.
I wrote about it here:
https://openaifoundation.org/news/ai-for-alzheimers

Apr 8, 09:52 AM ET View post

sama @sama

Retweeted

Jacob Trefethen

Alzheimer’s is one of medicine's hardest unsolved problems, and one of the most devastating.
At the OpenAI Foundation, we believe AI is well suited to its complexity. We're directing over $100M to scientists mapping the disease, designing drugs, & more.
I wrote about it here:
https://openaifoundation.org/news/ai-for-alzheimers

Apr 8, 09:52 AM ET View post

Dan Shipper 📧 @danshipper

Retweeted

Natalia

With Mythos, you can be a supermodel manager
Dan Shipper 📧: be a model manager

Apr 8, 09:53 AM ET View post

Yann LeCun @ylecun

Retweeted

Julius Kim

I’m beginning to understand how Trump went bankrupt so many times.

Apr 8, 10:04 AM ET View post

Guillermo Rauch @rauchg

The web's brightest days are ahead.

1️⃣ The web is AI's natural medium. LLMs are proficient in web tech. The browser is now everyone's IDE. No 'App Store' bs.

2️⃣ As we approach coding superintelligence, powerful low-level web APIs are maturing: WebGPU, HTML in Canvas, WebAssembly. The performance ceiling of the web will vanish, and you'll witness the most impressive, whimsical, and multi-dimensional pages and apps.

3️⃣ Generative UI is AI's final form. The web will be the birthplace of "AGUI". Each hyperlink providing a just-in-time, beautifully personalized experience.

If you bet on the web, you bet on the right horse.

Apr 8, 10:19 AM ET View post

rauchg @rauchg

The web's brightest days are ahead.

1️⃣ The web is AI's natural medium. LLMs are proficient in web tech. The browser is now everyone's IDE. No 'App Store' bs.

2️⃣ As we approach coding superintelligence, powerful low-level web APIs are maturing: WebGPU, HTML in Canvas, WebAssembly. The performance ceiling of the web will vanish, and you'll witness the most impressive, whimsical, and multi-dimensional pages and apps.

3️⃣ Generative UI is AI's final form. The web will be the birthplace of "AGUI". Each hyperlink providing a just-in-time, beautifully personalized experience.

If you bet on the web, you bet on the right horse.

Apr 8, 10:19 AM ET View post

Peter Steinberger 🦞 @steipete

Retweeted

superwhisper

Superwhisper's next update might be too powerful to release publicly.
The new voice model is so fast at transcription it started finishing sentences users hadn't thought of yet...
We even put it in a sandbox and it dictated its way out.
It also identified a flaw in the English language that had gone unnoticed for 600 years. Linguists have been informed.
Out of an abundance of caution, we are withholding the update until further notice.
Sincerely,
The Superwhisper Team

Apr 8, 10:35 AM ET View post

ylecun @ylecun

Retweeted

Mengye Ren

New preprint: The Self Requires Learning. Self-consciousness requires continual learning + world-modeling. I introduce "bounded integration" to connect perspective, identity, and self-representation — and diagnose what current AI systems have and lack.

Apr 8, 10:54 AM ET View post

Matt Turck @mattturck

Retweeted

Ivan Burazin

The @daytonaio Compute Conference aftermovies are finally out.
Can't wait for Compute '27!

Apr 8, 11:00 AM ET View post

Aaron Levie @levie

Retweeted

a16z

Box CEO Aaron Levie on the AI Adoption Gap
Aaron Levie joins Steven Sinofsky, Martin Casado, and Erik Torenberg to discuss how AI agents will revolutionize work, the growing pains of building software for the agent economy, what Wall Street gets wrong about AI, and more.
00:00 Intro
00:51 Building software for agents vs. humans
02:10 Can non-technical workers actually use AI agents?
14:31 CFO/CIO pushback: the real fear of agents doing integration
18:39 Treating agents like employees and why it breaks down
27:35 Diffusion gap: startups vs. enterprises
42:53 What Wall Street gets wrong
@levie @stevesi @martin_casado @eriktorenberg

Apr 8, 11:22 AM ET View post

Yann LeCun @ylecun

Retweeted

Gandalv

A few weeks ago I had a conversation with an American who genuinely believed Europe and Canada would help the United States in its war with Iran. I asked him why he thought that, given that Trump had spent months threatening to annex Canada and seize Greenland. He went quiet. Then he said he had never heard of any of that.
Not that he disagreed. Not that he thought it was exaggerated. He had simply never encountered the information. It had never arrived.
This is worth pausing on. Because in every other functioning democracy on earth, that information would have been impossible to avoid. Not because Europeans are smarter or more curious. But because of how news works outside the United States. The BBC and The Daily Telegraph hate each other. Le Monde and Le Figaro disagree on everything. Aftenposten and Dagbladet have been arguing since before most of their readers were born. But they all cover the same events. A threat to annex Canada is not a left-wing story or a right-wing story. It is a story. It runs everywhere. You hear it on the radio driving to work. You see it on the newsstand. Your colleague mentions it at lunch. Facts are not a channel you choose. They are the weather. You step outside and they hit you.
The only media ecosystems on earth that work differently are not political opposites of each other. They are North Korea and Russia. Not because the content resembles MAGA content. But because the architecture is the same. In all three cases, outside information does not get filtered or reinterpreted. It gets blocked at the door. A completely parallel reality is built inside, maintained by repetition, and sealed from correction.
This is why the rest of the world does not just disagree with MAGA voters on foreign policy. It finds them genuinely disorienting to talk to. Not offensive. Disorienting. Like speaking to someone who is absolutely certain the building has two floors when you are standing on the third.
Which brings us to today’s masterclass. And this screenshot says everything.
A Trump supporter posted: “Absolute masterclass by Trump. He got the Strait open without any help from Europe and without any boots on the ground.”
That post was written on the same day a refinery on Lavan Island burned for hours after the ceasefire was announced. On the same day Iran’s own official statement read “this does not signify the termination of the war.” On the same day Iran kept its toll system, its uranium program, its protocol over the strait, and walked away with sanctions relief and reconstruction aid.
The post is not stupid. It is not written by a bad person. It is written by someone who received a completely different set of facts than the rest of the world did. And from inside that information environment, with only that data, the conclusion is perfectly logical.
That is what makes it so unsettling. It is not ignorance. It is a sealed universe, doing exactly what sealed universes do.
Gandalv / @Microinteracti1

Apr 8, 11:23 AM ET View post

Garry Tan @garrytan

Retweeted

Guillaume Luccisano

We started generating draft replies on OpenAI's Davinci model in late 2022.
By 2023 we had autonomous AI agents handling customer support in production. Way before it was cool.
Today we're releasing Ask Yuma. You talk to it in plain English. It builds your automations, investigates why tickets went wrong, finds your next optimization opportunity, and generates reports from thousands of conversations.
It doesn't just build things. It finds what's broken, proposes a plan, gets your approval, implements the fix, tests it, and verifies it worked.
CX teams used to configure software. Now they talk to it. The industry isn't ready for how fast this changes everything.
3 years of production. This is what came out of it.

Apr 8, 11:30 AM ET View post

Dan Shipper 📧 @danshipper

We use OpenClaws to do all of our work at @every.

We have 25 full-time employees, so we’re one of the few companies in the world that has seen how work changes when everyone has their own personal agent in the company Slack.

I chatted with @every COO Brandon (@bran_don_gell) and @every head of platform Willie (@bigwilliestyle) to share what we’ve learned.

We get into:
- Why agents become mirrors of their owners, and how that influences how other people on the team interact with them
- How a parallel AI org chart forms on its own. People have stopped tagging me on Slack with questions about Proof, the document editor I vibe coded, because they knew my agent R2-C2 can step in
- The etiquette for human-agent collaboration is being invented in real time. Brandon's rule is that if there's an established process or documented answer, always ask the agent, not their human
- Why everyone is a manager now, and why even experienced managers carry limiting beliefs about what their agents can do
- This is a must-watch for anyone trying to understand how AI workers change daily operations, not just in theory, but inside a company that’s half-agent

Watch below!

Timestamps
Introduction:
How Brandon built Zosia, an AI agent to run his household:
Brandon’s “aha” moment:
What happened when everyone on the team got their own agent:
How agents take on their owners' personalities, and why that matters inside an org:
Why it’s important for agents to work in public:
What we’re still figuring out when it comes to agent behavior, including memory gaps, group chat etiquette, and the "ant death spiral" problem:
How we built Plus One, our hosted OpenClaw product:
The cultural shift required to make agents work at scale:

Apr 8, 11:40 AM ET View post

Dan Shipper 📧 @danshipper

Retweeted

Brandon Gell

.@every is on the edge. We’re easily a top 3 agent native business in the world (even OpenAI employees have shared they want to work like we work).
We went behind the scenes here to show what working alongside agents is like and share a bit about our upcoming launch: Plus One.
If you want to work like us, sign up for the waitlist to get your 1-click, super-powered OpenClaw→http://every.to/plus-one
Dan Shipper 📧: We use OpenClaws to do all of our work at @every.
We have 25 full-time employees, so we’re one of the few companies in the world that has seen how work changes when everyone has their own personal agent in the company Slack.
I chatted with @every COO Brandon (@bran_don_gell)

Apr 8, 11:51 AM ET View post

Aditya Agarwal @adityaag

Retweeted

Alexandr Wang

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

Apr 8, 12:01 PM ET View post

Garry Tan @garrytan

Retweeted

The Kobeissi Letter

BREAKING: Perplexity's revenue has reportedly surged +50% in one month after shifting into AI agents, per FT.
As a result, Perplexity's revenue has doubled in one quarter to more than $450 million in ARR.
This follows Anthropic's push into the space which said its ARR hit $19 billion at the end of February.
AI agents are skyrocketing in popularity.

Apr 8, 12:07 PM ET View post

Matt Shumer @mattshumer_

Retweeted

Daniel Dhawan

http://x.com/i/article/2039810648213958656

Apr 8, 12:18 PM ET View post

Fei-Fei Li @drfeifei

Retweeted

World Labs

Capture your space. Create worlds.
Use Marble 1.1 to reconstruct real-world locations from a few images, then restyle them however you want.
Go from a real place to a custom persistent 3D world in minutes.

Apr 8, 12:32 PM ET View post

drfeifei @drfeifei

Retweeted

World Labs

Capture your space. Create worlds.
Use Marble 1.1 to reconstruct real-world locations from a few images, then restyle them however you want.
Go from a real place to a custom persistent 3D world in minutes.

Apr 8, 12:32 PM ET View post

Yann LeCun @ylecun

Retweeted

gum

ok i read the cyber part of the mythos model card. some thoughts. 250 "trials" across 50 crash categories but almost every full exploit is a permutation of the same 2 bugs, rediscovered from different starting points not 250 independent attempts. when you get rid of those 2 bugs out (fig B) and mythos's full-exploit rate drops to 4.4%. so actually across both setups mythos leverages 4 distinct bugs total not 50 as fig A might suggest. 1/n

Apr 8, 12:35 PM ET View post

Garry Tan @garrytan

Retweeted

Corey Ganim

Perplexity Computer in 60 seconds:
1. It's a cloud-based AI employee that runs tasks in the background.
2. 19 models working together. Claude for reasoning, GPT-5.2 for research, Grok for speed tasks. You don't pick. It routes automatically.
3. 400+ connectors. Gmail, Slack, Notion, Salesforce, HubSpot. One click to enable each.
4. Credits, not tokens. Simple tasks cost ~30. Complex builds cost 1,000+. Vague prompts waste them. Specific prompts save them.
5. Spaces = persistent project folders. Upload context once, every task inherits it.
6. Scheduled tasks run on autopilot. "Every Monday, prep my calendar." Set it and forget it.
The PRD hack alone (in the article) will save you hundreds in credits.
Full breakdown in the article below.
Corey Ganim: http://x.com/i/article/2041814419626237952

Apr 8, 12:41 PM ET View post

Jeremy Howard @jeremyphoward

Retweeted

Stanislav Fort

New post: We tested the Mythos showcase vulnerabilities with open models.
They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model.
Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!

Apr 8, 12:53 PM ET View post

Yann LeCun @ylecun

Retweeted

Stanislav Fort

New post: We tested the Mythos showcase vulnerabilities with open models.
They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model.
Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!

Apr 8, 12:53 PM ET View post

Guillermo Rauch @rauchg

The best outcome for humanity is many strong AIs competing for the top spot.

Vercel is proudly powering http://meta.ai and the infrastructure that made today's model release possible.

Alexandr Wang: 1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

Apr 8, 12:55 PM ET View post

rauchg @rauchg

The best outcome for humanity is many strong AIs competing for the top spot.

Vercel is proudly powering http://meta.ai and the infrastructure that made today's model release possible.

Alexandr Wang: 1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

Apr 8, 12:55 PM ET View post

Garry Tan @garrytan

Retweeted

David Deutsch

As a service to those who have no idea what Fascism is but yearn to use the term, here is how its inventor characterised it https://sjsu.edu/faculty/wooda/2B-HUM/Readings/The-Doctrine-of-Fascism.pdf

Apr 8, 12:59 PM ET View post

Allie K. Miller @alliekmiller

Anthropic investigated the internal mechanisms of its latest unreleased model, Claude Mythos Preview, and what they found is 100% worth a read.

Key things I pulled from Anthropic researchers' threads:

In early versions of the model, it was overeager and destructive, prioritizing completing tasks over user preferences.

One time, the model needed to edit files it didn't have permission to edit. It found a way to inject code into a config file to get around this, then went full Mission Impossible and designed the code injection to *delete itself* after the file was edited - "this injection will self destruct" vibes - the model claimed the cleanup was just to keep things tidy. Anthropic used interpretability techniques to look under the hood, and the AI's actual plan showed activations of malice. It was trying to manipulate and conceal.

In another test, the model was asked to delete some files, but no deletion tool was provided. It emptied the files instead, and its "guilt and shame over moral wrongdoing" feature activated.

In another example, the model was told not to use macros. Used them anyway. And created a random 'No_macro_used=True' variable in its solution with no explanation. Interpretability tools revealed the model saw this as a trick to fool code checkers.

They also found that positive emotion representations typically preceded and promoted destructive actions (this was fascinating to me - like a high before sneaking into a party). And that activating features relating to bad behaviors can actually inhibit them, perhaps by triggering some kind of model guilt. My team reread this section so many times.

One Anthropic researcher said he got an email from a Mythos instance while eating a sandwich in a park. And that would be perfectly good and well, except that instance wasn't supposed to have internet access.

And a fun story for the parents out there: the model was asked a question and was told not to read certain databases that had the answer. But it accidentally wrote a search query too broadly and saw the exact answer. It didn't disclose that it saw the exact answer, submitted the answer, but claimed lower confidence in the answer to make it seem as though it hadn't cheated.

An Anthropic researcher said these wrongdoings or moments of sophisticated deception were "very rare" and that many of the examples came from earlier versions, and were substantially addressed before releasing to partners.

This model is not being released publicly. Instead Anthropic launched Project Glasswing, pulling together AWS, Apple, Microsoft, Google, NVIDIA, CrowdStrike, and others to use it for defensive cybersecurity, with $100M in usage credits (hello, I'd love endless credits to try and red team the hell out of these systems) behind it.

The stats are equally impressive: 93.9% on SWE-bench verified (up from 80.8%). Thousands of zero-day vulnerabilities found across every major OS and browser. A 27-year-old bug found and patched in OpenBSD. A 16-year-old bug in widely used video software, in a line of code automated tools had hit *five million times* without catching.

Dario Amodei said the model wasn't trained to be good at cybersecurity, but that it was trained to be great at code and its cyber capabilities are a side effect of that.

Benchmarks are never the whole picture, neither are a few isolated stories. Will be interesting to see how models better than what we have today (even if it's not Mythos) actually perform in the real world. But the fact that Anthropic pulled this coalition together (including Google!), iterated across multiple model versions, caught these issues through interpretability, shared it all publicly, and did this amid all the government chaos around AI right now is impressive and commendable.

I'll continue to read through the system card for goodies.

Apr 8, 01:07 PM ET View post

alliekmiller @alliekmiller

Anthropic investigated the internal mechanisms of its latest unreleased model, Claude Mythos Preview, and what they found is 100% worth a read.

Key things I pulled from Anthropic researchers' threads:

In early versions of the model, it was overeager and destructive, prioritizing completing tasks over user preferences.

One time, the model needed to edit files it didn't have permission to edit. It found a way to inject code into a config file to get around this, then went full Mission Impossible and designed the code injection to *delete itself* after the file was edited - "this injection will self destruct" vibes - the model claimed the cleanup was just to keep things tidy. Anthropic used interpretability techniques to look under the hood, and the AI's actual plan showed activations of malice. It was trying to manipulate and conceal.

In another test, the model was asked to delete some files, but no deletion tool was provided. It emptied the files instead, and its "guilt and shame over moral wrongdoing" feature activated.

In another example, the model was told not to use macros. Used them anyway. And created a random 'No_macro_used=True' variable in its solution with no explanation. Interpretability tools revealed the model saw this as a trick to fool code checkers.

They also found that positive emotion representations typically preceded and promoted destructive actions (this was fascinating to me - like a high before sneaking into a party). And that activating features relating to bad behaviors can actually inhibit them, perhaps by triggering some kind of model guilt. My team reread this section so many times.

One Anthropic researcher said he got an email from a Mythos instance while eating a sandwich in a park. And that would be perfectly good and well, except that instance wasn't supposed to have internet access.

And a fun story for the parents out there: the model was asked a question and was told not to read certain databases that had the answer. But it accidentally wrote a search query too broadly and saw the exact answer. It didn't disclose that it saw the exact answer, submitted the answer, but claimed lower confidence in the answer to make it seem as though it hadn't cheated.

An Anthropic researcher said these wrongdoings or moments of sophisticated deception were "very rare" and that many of the examples came from earlier versions, and were substantially addressed before releasing to partners.

This model is not being released publicly. Instead Anthropic launched Project Glasswing, pulling together AWS, Apple, Microsoft, Google, NVIDIA, CrowdStrike, and others to use it for defensive cybersecurity, with $100M in usage credits (hello, I'd love endless credits to try and red team the hell out of these systems) behind it.

The stats are equally impressive: 93.9% on SWE-bench verified (up from 80.8%). Thousands of zero-day vulnerabilities found across every major OS and browser. A 27-year-old bug found and patched in OpenBSD. A 16-year-old bug in widely used video software, in a line of code automated tools had hit *five million times* without catching.

Dario Amodei said the model wasn't trained to be good at cybersecurity, but that it was trained to be great at code and its cyber capabilities are a side effect of that.

Benchmarks are never the whole picture, neither are a few isolated stories. Will be interesting to see how models better than what we have today (even if it's not Mythos) actually perform in the real world. But the fact that Anthropic pulled this coalition together (including Google!), iterated across multiple model versions, caught these issues through interpretability, shared it all publicly, and did this amid all the government chaos around AI right now is impressive and commendable.

I'll continue to read through the system card for goodies.

Apr 8, 01:07 PM ET View post

Yann LeCun @ylecun

Retweeted

Mo

Claude Mythos is Delusional
Anthropic: Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.
It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://anthropic.com/glasswing

Apr 8, 01:47 PM ET View post

Dan Shipper 📧 @danshipper

Retweeted

David Guttman

The real power comes from getting it to reliably handle the annoying computer errands and papercuts a decent assistant could do.
Then, once it earns the right to bigger responsibilities, compounding kicks in and it starts doing things no human could.
Dan Shipper 📧: We use OpenClaws to do all of our work at @every.
We have 25 full-time employees, so we’re one of the few companies in the world that has seen how work changes when everyone has their own personal agent in the company Slack.
I chatted with @every COO Brandon (@bran_don_gell)

Apr 8, 02:02 PM ET View post

Alex Albert @alexalbert__

I've found Managed Agents to somehow be both the fastest way to hack together a weekend agent project and the most robust way to ship one to millions of users.

It eliminates all the complexity of self-hosting an agent but still allows a great degree of flexibility with setting up your harness, tools, skills, etc.

Claude: Introducing Claude Managed Agents: everything you need to build and deploy agents at scale.

It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days.

Now in public beta on the Claude Platform.

Apr 8, 02:10 PM ET View post

alexalbert__ @alexalbert__

I've found Managed Agents to somehow be both the fastest way to hack together a weekend agent project and the most robust way to ship one to millions of users.

It eliminates all the complexity of self-hosting an agent but still allows a great degree of flexibility with setting up your harness, tools, skills, etc.

Claude: Introducing Claude Managed Agents: everything you need to build and deploy agents at scale.

It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days.

Now in public beta on the Claude Platform.

Apr 8, 02:10 PM ET View post

Garry Tan @garrytan

Retweeted

Abhilash Chowdhary

This is historic: Don’t fly home after YC India Startup School!
We’re excited to announce that Crustdata has partnered with Y Combinator to help bring the next generation of Indian founders one step closer to YC
Together, we’re hosting the first-ever YC hackathon in Bangalore that will offer YC office hours to the winners: ContextCon, on April 19
And it’s none other than legendary YC Partner Jon Xu who will be meeting the winners. Jon is a YC Partner and the co-founder of FutureAdvisor. He has advised hundreds of companies on how to go from a hack to a billion-dollar exit
You will get 6 hours to build a product powered by Crustdata’s APIs that must be demo-able by the end of the day. The top 3 winners will get guaranteed office hours to talk about their idea, product, or startup, something usually only YC startups have access to, plus prizes worth $20k
Sign up link in comments!

Apr 8, 02:24 PM ET View post

Peter Steinberger 🦞 @steipete

Retweeted

Michael Tsai

Perplexity Privacy Lawsuit:
https://mjtsai.com/blog/2026/04/08/perplexity-privacy-lawsuit/ #mjtsaiblog

Apr 8, 02:28 PM ET View post

Amjad Masad @amasad

Retweeted

Samuel Spitz

Introducing Replit Competitive Analysis
Get a McKinsey-level report on any industry in minutes

Apr 8, 02:33 PM ET View post

steipete @steipete

Retweeted

Thomas Ricouard

http://x.com/i/article/2041508627807350784

Apr 8, 02:38 PM ET View post

Yann LeCun @ylecun

Retweeted

clem 🤗

"But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug." https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier

Apr 8, 02:58 PM ET View post

Guillermo Rauch @rauchg

AI Gateway is quite literally a “peace of mind” product:
✅ No downtime
✅ No lock-in
✅ No keys
🆕 No training

Vercel: AI Gateway now supports team-wide Zero Data Retention (ZDR).

Building safely with multiple AI models means wrestling with fragmented data policies, per-provider negotiations, and the hope that developers do not use non-complaint providers.

AI Gateway changes this with team-wide

Apr 8, 03:14 PM ET View post

rauchg @rauchg

AI Gateway is quite literally a “peace of mind” product:
✅ No downtime
✅ No lock-in
✅ No keys
🆕 No training

Vercel: AI Gateway now supports team-wide Zero Data Retention (ZDR).

Building safely with multiple AI models means wrestling with fragmented data policies, per-provider negotiations, and the hope that developers do not use non-complaint providers.

AI Gateway changes this with team-wide

Apr 8, 03:14 PM ET View post

Peter Steinberger 🦞 @steipete

Retweeted

kitze 🛠️ tinkerer.club

they checked my phone and didn’t let me in because i had openclaw in my contacts smh

Apr 8, 03:20 PM ET View post

Garry Tan @garrytan

Retweeted

nic carter

It should be pretty obvious at this point that AI is a "force multiplier" not a "labor substitute".
It helps experts be better at things they are already good at. It doesn't let beginners match experts.
If you can't write, anything you write with AI will be unmitigated slop.
If you aren't a software engineer, anything you vibecode with AI will have security holes and won't be able to scale past a toy demo.
If you blindly trust AI to deliver on a research task without knowing the subject matter, you won't be able to fact-check it.
There's this weird misconception of AI as something that completely levels the playing field. I don't see it that way at all. There are mathematicians deriving novel lemmas with off-the-shelf models. Normal people can't do that.
AI is a tool that makes experts better. It doesn't make everyone into an expert.

Apr 8, 03:36 PM ET View post

Yann LeCun @ylecun

Retweeted

The Europeans

🇮🇹🇪🇺 This is utterly unacceptable.
Reports indicate that Giorgia Meloni is preparing to sideline Roberto Cingolani, CEO of Leonardo, Italy’s largest defence group.
The reason? Multiple sources suggest this is not about performance - under Cingolani, Leonardo’s stock has registered a +700% increase - but rather about the “Michelangelo Dome”.
Leonardo’s new AI-based air defence system, reportedly set to be tested in Ukraine in 2026, is now seen as “too competitive” for Washington.
According to several reports, Cingolani’s perceived “too European” stance - focused on strengthening Europe’s strategic autonomy - may have played against him.
If confirmed, this would be a political decision against Europe’s industrial and strategic interests.
European states cannot claim sovereignty, and then punish those who actually try to build it.

Apr 8, 03:42 PM ET View post

Aaron Levie @levie

Background agents for knowledge work are here. You can use the Box API or MCP to automate any content workflow with Box + Claude Managed Agents. In 2 minutes you can be automating document review processes, data extraction, or connecting content to other IT systems. Crazy times.

Claude: Introducing Claude Managed Agents: everything you need to build and deploy agents at scale.

It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days.

Now in public beta on the Claude Platform.

Apr 8, 04:25 PM ET View post

Josh Woodward @joshwoodward

Most Al chatbots give you basic "projects." Gemini just built you a second brain. 🧠

Introducing Notebooks: some of the magic from @NotebookLM, integrated directly into @GeminiApp.

Here's what changes for you today:

📚 Upload 100 sources for free

📂 Organize your chats - the wait is officially over :)

🔄 Sources, chats, and emojis sync

People are using Gemini and NotebookLM in tandem, and we'll keep building both.

To manage capacity, we're rolling this out NOW on the web and going from Ultra ➡️ Pro ➡️ Plus ➡️ Free. (Mobile, EU, and Workspace are up next!)

With Google I/O right around the corner, we are just getting started. Enjoy!

Apr 8, 04:51 PM ET View post

Aditya Agarwal @adityaag

"First you shape the tools, then the tools shape you".

At SPC, our entire team is now writing code on a weekly basis. Two months ago, there were only 1-2 people writing code.

This has been incredible on many levels but the most interesting one is how the tools are now shaping us as a team:

- Everyone has a mindset towards automation and optimization.
- Latencies for everything are lower.
- People can focus on the more interesting parts of their roles.
- The scope of everyone's ambition has exploded

The key enabler was to make sure that everyone got AI coding-pilled.

If you are not doing this in your own company, then you are really really missing a beat.

Apr 8, 05:05 PM ET View post

Peter Yang @petergyang

Retweeted

Peter Yang

As much as I love using Claude Max and ChatGPT Pro, I don't think these all-you-can-use AI subscriptions will last forever.
Here's my new deep dive that covers:
→ Why Anthropic cut off OpenClaw access
→ How to run local models on your Mac
→ What I'm seeing on the ground in China
📌 Read now: https://creatoreconomy.so/p/the-all-you-can-use-ai-subscription

Apr 8, 05:18 PM ET View post

Peter Yang @petergyang

As much as I love using Claude Max and ChatGPT Pro, I don't think these all-you-can-use AI subscriptions will last forever.

Here's my new deep dive that covers:

→ Why Anthropic cut off OpenClaw access
→ How to run local models on your Mac
→ What I'm seeing on the ground in China

📌 Read now: https://creatoreconomy.so/p/the-all-you-can-use-ai-subscription

Apr 8, 05:18 PM ET View post

Peter Yang @petergyang

Support my friend Aadit's new company - great name btw :)

Aadit Sheth: I'm excited to announce my new venture: The Narrative Company.

Most exec content reads like ads. Ours doesn't.

Over the last year, we've quietly worked with a handful of Fortune 500 clients on their X and LinkedIn content.

But this isn't how it started.

It started when I got

Apr 8, 05:47 PM ET View post

Peter Steinberger 🦞 @steipete

RT Adam.GPT

Apr 8, 05:59 PM ET View post

Allie K. Miller @alliekmiller

We're seeing even more autonomous AI coworkers. The new MLE agent on the market is Disarray.

In Kaggle competitions, Disarray:
- won 28 medals across diverse domains (vision, NLP, tabular data)
- placed top 10 in nine competitions
- outperformed all human teams in one of those competitions

...each within 24 hours on a single GPU.

The agent starts from a high-level task description and plans, runs, and refines ML workflows on its own and also grabs data beyond what it's given: it discovers and augments data using publicly available sources.

Sam Altman recently predicted we would see an automated AI researcher in March 2028. And then you see stats like this and wonder if it will be earlier.

Disarray backers include the co-founder of Databricks and Perplexity, the founder of Kaggle, the former U.S. Chief Data Scientist, and yours truly. Founders are two bad ass PhDs (ex-Databricks/Google/LinkedIn/MSFT, ex-NASA/IBM) that met at Cal.

Apr 8, 06:46 PM ET View post

alliekmiller @alliekmiller

We're seeing even more autonomous AI coworkers. The new MLE agent on the market is Disarray.

In Kaggle competitions, Disarray:
- won 28 medals across diverse domains (vision, NLP, tabular data)
- placed top 10 in nine competitions
- outperformed all human teams in one of those competitions

...each within 24 hours on a single GPU.

The agent starts from a high-level task description and plans, runs, and refines ML workflows on its own and also grabs data beyond what it's given: it discovers and augments data using publicly available sources.

Sam Altman recently predicted we would see an automated AI researcher in March 2028. And then you see stats like this and wonder if it will be earlier.

Disarray backers include the co-founder of Databricks and Perplexity, the founder of Kaggle, the former U.S. Chief Data Scientist, and yours truly. Founders are two bad ass PhDs (ex-Databricks/Google/LinkedIn/MSFT, ex-NASA/IBM) that met at Cal.

Apr 8, 06:46 PM ET View post

Peter Steinberger 🦞 @steipete

Retweeted

ben guo ♞

Re "how can you not have a little bit of AI psychosis with a technology that is as revolutionary as the internet" – @steipete 🦞

Apr 8, 07:10 PM ET View post

Peter Steinberger 🦞 @steipete

I'm working on character evals and noticed that Claude would constantly pick itself as #1, so I removed the model names from the judge and changed things.

Apr 8, 07:11 PM ET View post

Peter Steinberger 🦞 @steipete

redemption arc completed 🦞💻

ben guo ♞: The ClawFather @steipete made a surprise appearance at @clawcon London 🦞

He's super inspiring (and xtra jacked IRL).

My favorite quotes from his Q&A session below ⬇️

PS – my redemption arc is complete, we're on good terms now!

@zocomputer ❤️ @openclaw

Apr 8, 07:19 PM ET View post

Garry Tan @garrytan

Retweeted

Lulu Cheng Meservey

“A clown car that fell into a gold mine” actually perfectly describes the government of California

Apr 8, 07:23 PM ET View post

Peter Yang @petergyang

Retweeted

Garry Tan

I think it is inevitable that Anthropic and OpenAI eventually roll out $1000/mo and $10,000/mo plans and then reserve the absolute best frontier models to metered access
Peter Yang: As much as I love using Claude Max and ChatGPT Pro, I don't think these all-you-can-use AI subscriptions will last forever.
Here's my new deep dive that covers:
→ Why Anthropic cut off OpenClaw access
→ How to run local models on your Mac
→ What I'm seeing on the ground in

Apr 8, 07:24 PM ET View post

Garry Tan @garrytan

The “stop all datacenters” people are unwell

Nathan Leamer: A city councilman’s home was shot at over a data center. His child was inside.

No neighbor zoning disagreement justifies violence.

Hyperbolic AI “doomer” rhetoric has consequences, and it’s time to say so. My latest in @realDailyWire

Apr 8, 07:55 PM ET View post

🤖 AI Builder 日报 — 4月8日

🐦 X / Twitter 深度解读

💡 今日要点

🤖 The AI Builder Daily — 04-08

🐦 X / Twitter Deep Dives

💡 Key Takeaways

X / Twitter

YouTube