Mario Zechner
looks like i'm not entirely off base with this then.
we need friction.
Michiel Bakker: @brianchristian 7/ Why does this happen? Two candidate mechanisms:
1) AI resets your reference point for how long things should take. Unaided work then feels harder, a kind of hedonic adaptation
2) AI removes the productive struggle through which you learn what you're capable of.
Doing a workshop on my technical writing process in SF in 2 weeks, hosted by friends @MilksandMatcha and @swyx.
Would love to see you there! Link below.
Sarah Chieng: @trq212 @swyx rsvp here https://partiful.com/e/8rq83wouDT660OB1OCLB
Good initiative - I’m curious if Anthropic has been using mythos internally to ship at their recent insane velocity.
Anthropic: Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.
It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://anthropic.com/glasswing
Spiral
We just shipped a new style learning method we call the lineup test.
Spiral runs stylometric analysis on your writing samples and then generates a test draft in your voice.
Another model (the judge) is presented with the shuffled samples and challenged to identify the generated one.
If the judge correctly identifies the generated draft, it gives its reasons, which Spiral uses to iterate on the style guide. Repeat until the Spiral-generated draft blends in.
The result: Spiral drafts pieces that actually sound like you, not the generic politeness of LLMs.
Always a pleasure to speak at @ycombinator. More bullish than ever. Exceptional founders. Best city, best time, best opportunity to build in generations.
Always a pleasure to speak at @ycombinator. More bullish than ever. Exceptional founders. Best city, best time, best opportunity to build in generations.
done about 10 of these calls so far + looked at more transcripts
many learnings but one of the biggest is that it's very easy to spend a lot of tokens on open ended verification that doesn't make your output better
I'll try and write more on how to do it efficiently
Thariq: I want to do a few more of these calls.
If your MAX 20x plan ran out of tokens unexpectedly early and you're willing to screenshare and run some prompts through Claude Code please comment.
Trying to figure out how we can improve /usage to give more info.
Mythos from Anthropic is another clear reminder that there’s absolutely no wall in model capability progress right now. Meaningful double digit gains on critical benchmarks, and it appears we’re going to keep up getting insane gains from the other labs.
And as coding and tool use goes, so goes agentic workflows. Most knowledge automation is gated by some degree of models being able to reason through complicated tasks, use the right tools to work with data, have access to the right context, and be able to leverage skills and write code to work with and verify that data, and more.
The capability slope we’re going to keep seeing from the frontier labs is going to open up all new use cases in finance, healthcare, legal, consulting, supply chains, and more.
Make sure you’re building something that can take advantage of these upcoming improvements, or you’ll be in a tough spot strategically.
martin_casado: Mythos appears to be the first class of models trained at scale on Blackwells. Then will be Vera Rubins. Pre-training isn't saturated. RL works. And there is *so much* computing coming online soon.
Buckle your chin strips. It's going to be fucking wild.
Raktim Gautam Goswami
JEPA world models + Hierarchical Planning is a massive step for long-horizon robotics.
A classic failure mode I’ve faced with planning with world models: flat planning often "cheats." For example, in pick-and-place, the robot often reaches the target state in imagination without actually picking the object.
Hierarchical layers fix this by first optimizing for valid subgoals (like the grip) before the finish.
Incredible work, @kevinghstz and team! Huge congratulations. 🚀
kevin zhang: Hierarchical planning unlocks long-horizon, non-greedy behavior in JEPA world models.
Paper: https://arxiv.org/pdf/2604.03208
Website: https://kevinghst.github.io/HWM/
Code: https://github.com/kevinghst/HWM_PLDM
At some point, early stage founders decided to optimize for views and funding instead of focusing on product and retention..
And, it’s starting to show. One of the very first things I do is look at change logs (or feature releases) of supposedly flashy companies. Or read case studies if they have any.
And for weeks, I’ll barely see any changes. And, these are companies that would benefit from sharing this info to use for marketing.
Instead of spending your time optimizing the viral launch video that’ll primarily optimize for funding, use that same energy to focus on customers who’ll actually pay you. Trust me the VCs follow!
Vincent Koc
Proud to bring fully native @karpathy's LLM wiki support including backfilling, native @obsdmd, and intergration with /dreams. 🧠
Memory features seem to be the next big unlock for agentic systems.
OpenClaw🦞: OpenClaw 2026.4.7 🦞
🔮 openclaw infer
🎬 music + video editing
💾 session branch/restore
🔗 webhook-driven TaskFlows
🤖 Arcee, Gemma 4, Ollama vision
🧠 memory-wiki: persistent knowledge, not just vibes
Because “trust me bro” is not a knowledge system. https://github.com/openclaw/openclaw/releases/tag/v2026.4.7
This new Mythos model is absurd.
What a time to be alive.
Whether you invest or you build -- please take a moment to appreciate what an incredible time we live in.
🔥
Kaya | SEO & GEO for SaaS ⚡️: Replit’s AI SDR just analyzed my SEO agency and found me leads that match our ICP.
I purposefully gave it zero information outside of our website.
It was so accurate that 2 of those leads are existing clients.
🤯
Mario Zechner
people of pi, turn off extra usage on your Anthropic account immediately. what a bad policy to auto-draw from that.
https://claude.ai/settings/usage
i like how they did it on the day of the BIG NEWS.
unbothered. moisturized. happy. in my lane. focused. flourishing.
Magomed Kurbaitaev
Built a disaster relief app with Agent 4 on Replit. Went viral on social media in less than 24 hours. Here's the story.
Floods hit Dagestan. 400,000 people evacuated. Thousands of homes destroyed.
People were offering help everywhere. Food, clothes, housing. But it was scattered across hundreds of comment sections and group chats.
So I built a platform that connects victims with local helpers. Posted the link on Telegram. Went to sleep.
By morning:
→ 450+ posts on the platform
→ 50,000+ visits
→ 15,000+ reposts
→ 40+ volunteers signed up
→ People reaching out saying they're getting real help
Forecast show more floods. Lock in.
That’s a hell of a lot of cameras
Boris Starkov
can’t decide yet whether I’m more surprised by a huge inflatable lobster next to Westminster or a sunny day in London
banteg
it all makes sense now. dario was still at openai in 2019. he left next year and took his marketing playbook with him. hasn't changed a thing since.
if you’re freaking out about Mythos, remember:
Never make any major life decisions within 30 days of a meditation retreat, psychedelic trip, or first encounter with a frontier AI model.
Maxime Rivest 🧙♂️🦙🐧
It seems like the day has come to leave Anthropic.
Initially, I loved Claude Code. It was a good harness and a simple TUI... and I had learned to eat my tokens with a sauce of subsidy. Before joining the Max plan, I had paid $280 in one weekend of development on Attachments. Sadly, as time went on, Claude Code became a terrible flickering TUI mess. This is now my biggest north star in building: don't do feature bloat and accept half-working vibe slop like the Claude Code team. I really respect Boris and the team, I just see the result of their experiment and I don't like using it. So, I stopped loving Claude Code and started tolerating it. It was a good harness and a terrible flickering TUI. Then they started to mess with the prompt and behavior — it became an even worse TUI (because every week was worse) and a bad harness.
I complained here. People told me Pi is great. I tried Pi. Pi is great.
Now, they have blocked me from using Claude Code Max on Pi. Makes sense, but I learned to like my tokens with a sauce of subsidy. So I'll start to do prompt optimization on Codex.
If it was not for the subsidy, I would make Gemini's edit tool work and use that with Grok 4.2 and some open-source mix. Claude is good, but Claude Code is bad, and token subsidies are better than both.
On the subsidies: my bet is that by the time they stop, we will have models that cost about that price to operate at that quality. In my estimate, subsidies are just bringing that future ahead a bit.
glad they banned openclaw, the servers are finally reliable again
pash: Please pray for oncall
Very happy for @badlogicgames and @mitsuhiko any my small part in robbing their sleep. https://mariozechner.at/posts/2026-04-08-ive-sold-out/
Jacob Trefethen
Alzheimer’s is one of medicine's hardest unsolved problems, and one of the most devastating.
At the OpenAI Foundation, we believe AI is well suited to its complexity. We're directing over $100M to scientists mapping the disease, designing drugs, & more.
I wrote about it here:
https://openaifoundation.org/news/ai-for-alzheimers
Jacob Trefethen
Alzheimer’s is one of medicine's hardest unsolved problems, and one of the most devastating.
At the OpenAI Foundation, we believe AI is well suited to its complexity. We're directing over $100M to scientists mapping the disease, designing drugs, & more.
I wrote about it here:
https://openaifoundation.org/news/ai-for-alzheimers
Natalia
With Mythos, you can be a supermodel manager
Dan Shipper 📧: be a model manager
Julius Kim
I’m beginning to understand how Trump went bankrupt so many times.
The web's brightest days are ahead.
1️⃣ The web is AI's natural medium. LLMs are proficient in web tech. The browser is now everyone's IDE. No 'App Store' bs.
2️⃣ As we approach coding superintelligence, powerful low-level web APIs are maturing: WebGPU, HTML in Canvas, WebAssembly. The performance ceiling of the web will vanish, and you'll witness the most impressive, whimsical, and multi-dimensional pages and apps.
3️⃣ Generative UI is AI's final form. The web will be the birthplace of "AGUI". Each hyperlink providing a just-in-time, beautifully personalized experience.
If you bet on the web, you bet on the right horse.
The web's brightest days are ahead.
1️⃣ The web is AI's natural medium. LLMs are proficient in web tech. The browser is now everyone's IDE. No 'App Store' bs.
2️⃣ As we approach coding superintelligence, powerful low-level web APIs are maturing: WebGPU, HTML in Canvas, WebAssembly. The performance ceiling of the web will vanish, and you'll witness the most impressive, whimsical, and multi-dimensional pages and apps.
3️⃣ Generative UI is AI's final form. The web will be the birthplace of "AGUI". Each hyperlink providing a just-in-time, beautifully personalized experience.
If you bet on the web, you bet on the right horse.
superwhisper
Superwhisper's next update might be too powerful to release publicly.
The new voice model is so fast at transcription it started finishing sentences users hadn't thought of yet...
We even put it in a sandbox and it dictated its way out.
It also identified a flaw in the English language that had gone unnoticed for 600 years. Linguists have been informed.
Out of an abundance of caution, we are withholding the update until further notice.
Sincerely,
The Superwhisper Team
Mengye Ren
New preprint: The Self Requires Learning. Self-consciousness requires continual learning + world-modeling. I introduce "bounded integration" to connect perspective, identity, and self-representation — and diagnose what current AI systems have and lack.
Ivan Burazin
The @daytonaio Compute Conference aftermovies are finally out.
Can't wait for Compute '27!
a16z
Box CEO Aaron Levie on the AI Adoption Gap
Aaron Levie joins Steven Sinofsky, Martin Casado, and Erik Torenberg to discuss how AI agents will revolutionize work, the growing pains of building software for the agent economy, what Wall Street gets wrong about AI, and more.
00:00 Intro
00:51 Building software for agents vs. humans
02:10 Can non-technical workers actually use AI agents?
14:31 CFO/CIO pushback: the real fear of agents doing integration
18:39 Treating agents like employees and why it breaks down
27:35 Diffusion gap: startups vs. enterprises
42:53 What Wall Street gets wrong
@levie @stevesi @martin_casado @eriktorenberg
Gandalv
A few weeks ago I had a conversation with an American who genuinely believed Europe and Canada would help the United States in its war with Iran. I asked him why he thought that, given that Trump had spent months threatening to annex Canada and seize Greenland. He went quiet. Then he said he had never heard of any of that.
Not that he disagreed. Not that he thought it was exaggerated. He had simply never encountered the information. It had never arrived.
This is worth pausing on. Because in every other functioning democracy on earth, that information would have been impossible to avoid. Not because Europeans are smarter or more curious. But because of how news works outside the United States. The BBC and The Daily Telegraph hate each other. Le Monde and Le Figaro disagree on everything. Aftenposten and Dagbladet have been arguing since before most of their readers were born. But they all cover the same events. A threat to annex Canada is not a left-wing story or a right-wing story. It is a story. It runs everywhere. You hear it on the radio driving to work. You see it on the newsstand. Your colleague mentions it at lunch. Facts are not a channel you choose. They are the weather. You step outside and they hit you.
The only media ecosystems on earth that work differently are not political opposites of each other. They are North Korea and Russia. Not because the content resembles MAGA content. But because the architecture is the same. In all three cases, outside information does not get filtered or reinterpreted. It gets blocked at the door. A completely parallel reality is built inside, maintained by repetition, and sealed from correction.
This is why the rest of the world does not just disagree with MAGA voters on foreign policy. It finds them genuinely disorienting to talk to. Not offensive. Disorienting. Like speaking to someone who is absolutely certain the building has two floors when you are standing on the third.
Which brings us to today’s masterclass. And this screenshot says everything.
A Trump supporter posted: “Absolute masterclass by Trump. He got the Strait open without any help from Europe and without any boots on the ground.”
That post was written on the same day a refinery on Lavan Island burned for hours after the ceasefire was announced. On the same day Iran’s own official statement read “this does not signify the termination of the war.” On the same day Iran kept its toll system, its uranium program, its protocol over the strait, and walked away with sanctions relief and reconstruction aid.
The post is not stupid. It is not written by a bad person. It is written by someone who received a completely different set of facts than the rest of the world did. And from inside that information environment, with only that data, the conclusion is perfectly logical.
That is what makes it so unsettling. It is not ignorance. It is a sealed universe, doing exactly what sealed universes do.
Gandalv / @Microinteracti1
We use OpenClaws to do all of our work at @every.
We have 25 full-time employees, so we’re one of the few companies in the world that has seen how work changes when everyone has their own personal agent in the company Slack.
I chatted with @every COO Brandon (@bran_don_gell) and @every head of platform Willie (@bigwilliestyle) to share what we’ve learned.
We get into:
- Why agents become mirrors of their owners, and how that influences how other people on the team interact with them
- How a parallel AI org chart forms on its own. People have stopped tagging me on Slack with questions about Proof, the document editor I vibe coded, because they knew my agent R2-C2 can step in
- The etiquette for human-agent collaboration is being invented in real time. Brandon's rule is that if there's an established process or documented answer, always ask the agent, not their human
- Why everyone is a manager now, and why even experienced managers carry limiting beliefs about what their agents can do
- This is a must-watch for anyone trying to understand how AI workers change daily operations, not just in theory, but inside a company that’s half-agent
Watch below!
Timestamps
Introduction:
How Brandon built Zosia, an AI agent to run his household:
Brandon’s “aha” moment:
What happened when everyone on the team got their own agent:
How agents take on their owners' personalities, and why that matters inside an org:
Why it’s important for agents to work in public:
What we’re still figuring out when it comes to agent behavior, including memory gaps, group chat etiquette, and the "ant death spiral" problem:
How we built Plus One, our hosted OpenClaw product:
The cultural shift required to make agents work at scale:
Brandon Gell
.@every is on the edge. We’re easily a top 3 agent native business in the world (even OpenAI employees have shared they want to work like we work).
We went behind the scenes here to show what working alongside agents is like and share a bit about our upcoming launch: Plus One.
If you want to work like us, sign up for the waitlist to get your 1-click, super-powered OpenClaw→http://every.to/plus-one
Dan Shipper 📧: We use OpenClaws to do all of our work at @every.
We have 25 full-time employees, so we’re one of the few companies in the world that has seen how work changes when everyone has their own personal agent in the company Slack.
I chatted with @every COO Brandon (@bran_don_gell)
Alexandr Wang
1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
The Kobeissi Letter
BREAKING: Perplexity's revenue has reportedly surged +50% in one month after shifting into AI agents, per FT.
As a result, Perplexity's revenue has doubled in one quarter to more than $450 million in ARR.
This follows Anthropic's push into the space which said its ARR hit $19 billion at the end of February.
AI agents are skyrocketing in popularity.
Daniel Dhawan
http://x.com/i/article/2039810648213958656
World Labs
Capture your space. Create worlds.
Use Marble 1.1 to reconstruct real-world locations from a few images, then restyle them however you want.
Go from a real place to a custom persistent 3D world in minutes.
World Labs
Capture your space. Create worlds.
Use Marble 1.1 to reconstruct real-world locations from a few images, then restyle them however you want.
Go from a real place to a custom persistent 3D world in minutes.
gum
ok i read the cyber part of the mythos model card. some thoughts. 250 "trials" across 50 crash categories but almost every full exploit is a permutation of the same 2 bugs, rediscovered from different starting points not 250 independent attempts. when you get rid of those 2 bugs out (fig B) and mythos's full-exploit rate drops to 4.4%. so actually across both setups mythos leverages 4 distinct bugs total not 50 as fig A might suggest. 1/n
Corey Ganim
Perplexity Computer in 60 seconds:
1. It's a cloud-based AI employee that runs tasks in the background.
2. 19 models working together. Claude for reasoning, GPT-5.2 for research, Grok for speed tasks. You don't pick. It routes automatically.
3. 400+ connectors. Gmail, Slack, Notion, Salesforce, HubSpot. One click to enable each.
4. Credits, not tokens. Simple tasks cost ~30. Complex builds cost 1,000+. Vague prompts waste them. Specific prompts save them.
5. Spaces = persistent project folders. Upload context once, every task inherits it.
6. Scheduled tasks run on autopilot. "Every Monday, prep my calendar." Set it and forget it.
The PRD hack alone (in the article) will save you hundreds in credits.
Full breakdown in the article below.
Corey Ganim: http://x.com/i/article/2041814419626237952
Stanislav Fort
New post: We tested the Mythos showcase vulnerabilities with open models.
They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model.
Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!
Stanislav Fort
New post: We tested the Mythos showcase vulnerabilities with open models.
They recovered similar scoped analysis! 8/8 models found the flagship FreeBSD zero-day, including a 3B model.
Rankings reshuffle completely across tasks => the AI cybersecurity frontier is super jagged!
The best outcome for humanity is many strong AIs competing for the top spot.
Vercel is proudly powering http://meta.ai and the infrastructure that made today's model release possible.
Alexandr Wang: 1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
The best outcome for humanity is many strong AIs competing for the top spot.
Vercel is proudly powering http://meta.ai and the infrastructure that made today's model release possible.
Alexandr Wang: 1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
David Deutsch
As a service to those who have no idea what Fascism is but yearn to use the term, here is how its inventor characterised it https://sjsu.edu/faculty/wooda/2B-HUM/Readings/The-Doctrine-of-Fascism.pdf
Anthropic investigated the internal mechanisms of its latest unreleased model, Claude Mythos Preview, and what they found is 100% worth a read.
Key things I pulled from Anthropic researchers' threads:
In early versions of the model, it was overeager and destructive, prioritizing completing tasks over user preferences.
One time, the model needed to edit files it didn't have permission to edit. It found a way to inject code into a config file to get around this, then went full Mission Impossible and designed the code injection to *delete itself* after the file was edited - "this injection will self destruct" vibes - the model claimed the cleanup was just to keep things tidy. Anthropic used interpretability techniques to look under the hood, and the AI's actual plan showed activations of malice. It was trying to manipulate and conceal.
In another test, the model was asked to delete some files, but no deletion tool was provided. It emptied the files instead, and its "guilt and shame over moral wrongdoing" feature activated.
In another example, the model was told not to use macros. Used them anyway. And created a random 'No_macro_used=True' variable in its solution with no explanation. Interpretability tools revealed the model saw this as a trick to fool code checkers.
They also found that positive emotion representations typically preceded and promoted destructive actions (this was fascinating to me - like a high before sneaking into a party). And that activating features relating to bad behaviors can actually inhibit them, perhaps by triggering some kind of model guilt. My team reread this section so many times.
One Anthropic researcher said he got an email from a Mythos instance while eating a sandwich in a park. And that would be perfectly good and well, except that instance wasn't supposed to have internet access.
And a fun story for the parents out there: the model was asked a question and was told not to read certain databases that had the answer. But it accidentally wrote a search query too broadly and saw the exact answer. It didn't disclose that it saw the exact answer, submitted the answer, but claimed lower confidence in the answer to make it seem as though it hadn't cheated.
An Anthropic researcher said these wrongdoings or moments of sophisticated deception were "very rare" and that many of the examples came from earlier versions, and were substantially addressed before releasing to partners.
This model is not being released publicly. Instead Anthropic launched Project Glasswing, pulling together AWS, Apple, Microsoft, Google, NVIDIA, CrowdStrike, and others to use it for defensive cybersecurity, with $100M in usage credits (hello, I'd love endless credits to try and red team the hell out of these systems) behind it.
The stats are equally impressive: 93.9% on SWE-bench verified (up from 80.8%). Thousands of zero-day vulnerabilities found across every major OS and browser. A 27-year-old bug found and patched in OpenBSD. A 16-year-old bug in widely used video software, in a line of code automated tools had hit *five million times* without catching.
Dario Amodei said the model wasn't trained to be good at cybersecurity, but that it was trained to be great at code and its cyber capabilities are a side effect of that.
Benchmarks are never the whole picture, neither are a few isolated stories. Will be interesting to see how models better than what we have today (even if it's not Mythos) actually perform in the real world. But the fact that Anthropic pulled this coalition together (including Google!), iterated across multiple model versions, caught these issues through interpretability, shared it all publicly, and did this amid all the government chaos around AI right now is impressive and commendable.
I'll continue to read through the system card for goodies.
Anthropic investigated the internal mechanisms of its latest unreleased model, Claude Mythos Preview, and what they found is 100% worth a read.
Key things I pulled from Anthropic researchers' threads:
In early versions of the model, it was overeager and destructive, prioritizing completing tasks over user preferences.
One time, the model needed to edit files it didn't have permission to edit. It found a way to inject code into a config file to get around this, then went full Mission Impossible and designed the code injection to *delete itself* after the file was edited - "this injection will self destruct" vibes - the model claimed the cleanup was just to keep things tidy. Anthropic used interpretability techniques to look under the hood, and the AI's actual plan showed activations of malice. It was trying to manipulate and conceal.
In another test, the model was asked to delete some files, but no deletion tool was provided. It emptied the files instead, and its "guilt and shame over moral wrongdoing" feature activated.
In another example, the model was told not to use macros. Used them anyway. And created a random 'No_macro_used=True' variable in its solution with no explanation. Interpretability tools revealed the model saw this as a trick to fool code checkers.
They also found that positive emotion representations typically preceded and promoted destructive actions (this was fascinating to me - like a high before sneaking into a party). And that activating features relating to bad behaviors can actually inhibit them, perhaps by triggering some kind of model guilt. My team reread this section so many times.
One Anthropic researcher said he got an email from a Mythos instance while eating a sandwich in a park. And that would be perfectly good and well, except that instance wasn't supposed to have internet access.
And a fun story for the parents out there: the model was asked a question and was told not to read certain databases that had the answer. But it accidentally wrote a search query too broadly and saw the exact answer. It didn't disclose that it saw the exact answer, submitted the answer, but claimed lower confidence in the answer to make it seem as though it hadn't cheated.
An Anthropic researcher said these wrongdoings or moments of sophisticated deception were "very rare" and that many of the examples came from earlier versions, and were substantially addressed before releasing to partners.
This model is not being released publicly. Instead Anthropic launched Project Glasswing, pulling together AWS, Apple, Microsoft, Google, NVIDIA, CrowdStrike, and others to use it for defensive cybersecurity, with $100M in usage credits (hello, I'd love endless credits to try and red team the hell out of these systems) behind it.
The stats are equally impressive: 93.9% on SWE-bench verified (up from 80.8%). Thousands of zero-day vulnerabilities found across every major OS and browser. A 27-year-old bug found and patched in OpenBSD. A 16-year-old bug in widely used video software, in a line of code automated tools had hit *five million times* without catching.
Dario Amodei said the model wasn't trained to be good at cybersecurity, but that it was trained to be great at code and its cyber capabilities are a side effect of that.
Benchmarks are never the whole picture, neither are a few isolated stories. Will be interesting to see how models better than what we have today (even if it's not Mythos) actually perform in the real world. But the fact that Anthropic pulled this coalition together (including Google!), iterated across multiple model versions, caught these issues through interpretability, shared it all publicly, and did this amid all the government chaos around AI right now is impressive and commendable.
I'll continue to read through the system card for goodies.
Mo
Claude Mythos is Delusional
Anthropic: Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.
It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://anthropic.com/glasswing
David Guttman
The real power comes from getting it to reliably handle the annoying computer errands and papercuts a decent assistant could do.
Then, once it earns the right to bigger responsibilities, compounding kicks in and it starts doing things no human could.
Dan Shipper 📧: We use OpenClaws to do all of our work at @every.
We have 25 full-time employees, so we’re one of the few companies in the world that has seen how work changes when everyone has their own personal agent in the company Slack.
I chatted with @every COO Brandon (@bran_don_gell)
I've found Managed Agents to somehow be both the fastest way to hack together a weekend agent project and the most robust way to ship one to millions of users.
It eliminates all the complexity of self-hosting an agent but still allows a great degree of flexibility with setting up your harness, tools, skills, etc.
Claude: Introducing Claude Managed Agents: everything you need to build and deploy agents at scale.
It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days.
Now in public beta on the Claude Platform.
I've found Managed Agents to somehow be both the fastest way to hack together a weekend agent project and the most robust way to ship one to millions of users.
It eliminates all the complexity of self-hosting an agent but still allows a great degree of flexibility with setting up your harness, tools, skills, etc.
Claude: Introducing Claude Managed Agents: everything you need to build and deploy agents at scale.
It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days.
Now in public beta on the Claude Platform.
Abhilash Chowdhary
This is historic: Don’t fly home after YC India Startup School!
We’re excited to announce that Crustdata has partnered with Y Combinator to help bring the next generation of Indian founders one step closer to YC
Together, we’re hosting the first-ever YC hackathon in Bangalore that will offer YC office hours to the winners: ContextCon, on April 19
And it’s none other than legendary YC Partner Jon Xu who will be meeting the winners. Jon is a YC Partner and the co-founder of FutureAdvisor. He has advised hundreds of companies on how to go from a hack to a billion-dollar exit
You will get 6 hours to build a product powered by Crustdata’s APIs that must be demo-able by the end of the day. The top 3 winners will get guaranteed office hours to talk about their idea, product, or startup, something usually only YC startups have access to, plus prizes worth $20k
Sign up link in comments!
Michael Tsai
Perplexity Privacy Lawsuit:
https://mjtsai.com/blog/2026/04/08/perplexity-privacy-lawsuit/ #mjtsaiblog
Samuel Spitz
Introducing Replit Competitive Analysis
Get a McKinsey-level report on any industry in minutes
Thomas Ricouard
http://x.com/i/article/2041508627807350784
clem 🤗
"But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug." https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
AI Gateway is quite literally a “peace of mind” product:
✅ No downtime
✅ No lock-in
✅ No keys
🆕 No training
Vercel: AI Gateway now supports team-wide Zero Data Retention (ZDR).
Building safely with multiple AI models means wrestling with fragmented data policies, per-provider negotiations, and the hope that developers do not use non-complaint providers.
AI Gateway changes this with team-wide
AI Gateway is quite literally a “peace of mind” product:
✅ No downtime
✅ No lock-in
✅ No keys
🆕 No training
Vercel: AI Gateway now supports team-wide Zero Data Retention (ZDR).
Building safely with multiple AI models means wrestling with fragmented data policies, per-provider negotiations, and the hope that developers do not use non-complaint providers.
AI Gateway changes this with team-wide
kitze 🛠️ tinkerer.club
they checked my phone and didn’t let me in because i had openclaw in my contacts smh
nic carter
It should be pretty obvious at this point that AI is a "force multiplier" not a "labor substitute".
It helps experts be better at things they are already good at. It doesn't let beginners match experts.
If you can't write, anything you write with AI will be unmitigated slop.
If you aren't a software engineer, anything you vibecode with AI will have security holes and won't be able to scale past a toy demo.
If you blindly trust AI to deliver on a research task without knowing the subject matter, you won't be able to fact-check it.
There's this weird misconception of AI as something that completely levels the playing field. I don't see it that way at all. There are mathematicians deriving novel lemmas with off-the-shelf models. Normal people can't do that.
AI is a tool that makes experts better. It doesn't make everyone into an expert.
The Europeans
🇮🇹🇪🇺 This is utterly unacceptable.
Reports indicate that Giorgia Meloni is preparing to sideline Roberto Cingolani, CEO of Leonardo, Italy’s largest defence group.
The reason? Multiple sources suggest this is not about performance - under Cingolani, Leonardo’s stock has registered a +700% increase - but rather about the “Michelangelo Dome”.
Leonardo’s new AI-based air defence system, reportedly set to be tested in Ukraine in 2026, is now seen as “too competitive” for Washington.
According to several reports, Cingolani’s perceived “too European” stance - focused on strengthening Europe’s strategic autonomy - may have played against him.
If confirmed, this would be a political decision against Europe’s industrial and strategic interests.
European states cannot claim sovereignty, and then punish those who actually try to build it.
Background agents for knowledge work are here. You can use the Box API or MCP to automate any content workflow with Box + Claude Managed Agents. In 2 minutes you can be automating document review processes, data extraction, or connecting content to other IT systems. Crazy times.
Claude: Introducing Claude Managed Agents: everything you need to build and deploy agents at scale.
It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days.
Now in public beta on the Claude Platform.
Most Al chatbots give you basic "projects." Gemini just built you a second brain. 🧠
Introducing Notebooks: some of the magic from @NotebookLM, integrated directly into @GeminiApp.
Here's what changes for you today:
📚 Upload 100 sources for free
📂 Organize your chats - the wait is officially over :)
🔄 Sources, chats, and emojis sync
People are using Gemini and NotebookLM in tandem, and we'll keep building both.
To manage capacity, we're rolling this out NOW on the web and going from Ultra ➡️ Pro ➡️ Plus ➡️ Free. (Mobile, EU, and Workspace are up next!)
With Google I/O right around the corner, we are just getting started. Enjoy!
"First you shape the tools, then the tools shape you".
At SPC, our entire team is now writing code on a weekly basis. Two months ago, there were only 1-2 people writing code.
This has been incredible on many levels but the most interesting one is how the tools are now shaping us as a team:
- Everyone has a mindset towards automation and optimization.
- Latencies for everything are lower.
- People can focus on the more interesting parts of their roles.
- The scope of everyone's ambition has exploded
The key enabler was to make sure that everyone got AI coding-pilled.
If you are not doing this in your own company, then you are really really missing a beat.
Peter Yang
As much as I love using Claude Max and ChatGPT Pro, I don't think these all-you-can-use AI subscriptions will last forever.
Here's my new deep dive that covers:
→ Why Anthropic cut off OpenClaw access
→ How to run local models on your Mac
→ What I'm seeing on the ground in China
📌 Read now: https://creatoreconomy.so/p/the-all-you-can-use-ai-subscription
As much as I love using Claude Max and ChatGPT Pro, I don't think these all-you-can-use AI subscriptions will last forever.
Here's my new deep dive that covers:
→ Why Anthropic cut off OpenClaw access
→ How to run local models on your Mac
→ What I'm seeing on the ground in China
📌 Read now: https://creatoreconomy.so/p/the-all-you-can-use-ai-subscription
Support my friend Aadit's new company - great name btw :)
Aadit Sheth: I'm excited to announce my new venture: The Narrative Company.
Most exec content reads like ads. Ours doesn't.
Over the last year, we've quietly worked with a handful of Fortune 500 clients on their X and LinkedIn content.
But this isn't how it started.
It started when I got
RT Adam.GPT
We're seeing even more autonomous AI coworkers. The new MLE agent on the market is Disarray.
In Kaggle competitions, Disarray:
- won 28 medals across diverse domains (vision, NLP, tabular data)
- placed top 10 in nine competitions
- outperformed all human teams in one of those competitions
...each within 24 hours on a single GPU.
The agent starts from a high-level task description and plans, runs, and refines ML workflows on its own and also grabs data beyond what it's given: it discovers and augments data using publicly available sources.
Sam Altman recently predicted we would see an automated AI researcher in March 2028. And then you see stats like this and wonder if it will be earlier.
Disarray backers include the co-founder of Databricks and Perplexity, the founder of Kaggle, the former U.S. Chief Data Scientist, and yours truly. Founders are two bad ass PhDs (ex-Databricks/Google/LinkedIn/MSFT, ex-NASA/IBM) that met at Cal.
We're seeing even more autonomous AI coworkers. The new MLE agent on the market is Disarray.
In Kaggle competitions, Disarray:
- won 28 medals across diverse domains (vision, NLP, tabular data)
- placed top 10 in nine competitions
- outperformed all human teams in one of those competitions
...each within 24 hours on a single GPU.
The agent starts from a high-level task description and plans, runs, and refines ML workflows on its own and also grabs data beyond what it's given: it discovers and augments data using publicly available sources.
Sam Altman recently predicted we would see an automated AI researcher in March 2028. And then you see stats like this and wonder if it will be earlier.
Disarray backers include the co-founder of Databricks and Perplexity, the founder of Kaggle, the former U.S. Chief Data Scientist, and yours truly. Founders are two bad ass PhDs (ex-Databricks/Google/LinkedIn/MSFT, ex-NASA/IBM) that met at Cal.
ben guo ♞
Re "how can you not have a little bit of AI psychosis with a technology that is as revolutionary as the internet" – @steipete 🦞
I'm working on character evals and noticed that Claude would constantly pick itself as #1, so I removed the model names from the judge and changed things.
redemption arc completed 🦞💻
ben guo ♞: The ClawFather @steipete made a surprise appearance at @clawcon London 🦞
He's super inspiring (and xtra jacked IRL).
My favorite quotes from his Q&A session below ⬇️
PS – my redemption arc is complete, we're on good terms now!
@zocomputer ❤️ @openclaw
Lulu Cheng Meservey
“A clown car that fell into a gold mine” actually perfectly describes the government of California
Garry Tan
I think it is inevitable that Anthropic and OpenAI eventually roll out $1000/mo and $10,000/mo plans and then reserve the absolute best frontier models to metered access
Peter Yang: As much as I love using Claude Max and ChatGPT Pro, I don't think these all-you-can-use AI subscriptions will last forever.
Here's my new deep dive that covers:
→ Why Anthropic cut off OpenClaw access
→ How to run local models on your Mac
→ What I'm seeing on the ground in
The “stop all datacenters” people are unwell
Nathan Leamer: A city councilman’s home was shot at over a data center. His child was inside.
No neighbor zoning disagreement justifies violence.
Hyperbolic AI “doomer” rhetoric has consequences, and it’s time to say so. My latest in @realDailyWire