2026-04-16

gdb @gdb

encouraging commentary from Terence Tao!

Haider.: mathematician Terence Tao on the gpt-5.4 pro solving Erdős problem #1196:

"the AI-generated paper may have made a meaningful contribution by revealing a deeper mathematical connection that earlier work had not clearly made explicit,

which value beyond solving this particular

Apr 15, 09:42 PM ET View post

danshipper @danshipper

Retweeted

Spiral

Spiral's new onboarding flow: Writing samples → LLM style guide automatically.
1. Accepts writing samples from your X account, website, files, or pasted text
2. Runs stylometry on the samples and produces an LLM-optimized style guide
3. LLM-as-a-judge evaluates a test draft to see if it blends in with your writing samples (fail case iterates on the guide and re-evaluates)
Demo video (token generations sped up):

Apr 15, 09:59 PM ET View post

swyx @swyx

in the grand narrative of Meta x AI, we saw the flop (Llama 4 hurhurhur), and now we’re seeing the turn:

- *more* hiring since the soup wars of 2025
- Zuck literally moved in with Alexandr and Nat and is koding again
- finally GA’ed Opus-ish level model (no api, not open, but still)
- bought @dps Dreamer and @peakji Manus to build the AI OS prosumer layer

the MSL “river” is gonna be pretty exciting.

Charles Rollet: Scoop! Meta has hired a *fifth* founding member from Thinking Machines Lab.

Joshua Gross is a top engineer who built Thinky's flagship product, Tinker, from "zero-to-one."

He now leads engineering teams at Meta Superintelligence Labs.

Apr 15, 10:08 PM ET View post

ylecun @ylecun

Retweeted

Kenneth Roth

Viktor Orbán’s electoral loss in Hungary is as much a defeat for Trump and JD Vance. "Seldom have American leaders intervened so overtly in a foreign election, and seldom has their preferred candidate fared so badly." https://trib.al/e33Y7QB

Apr 16, 12:15 AM ET View post

gdb @gdb

always a real feeling of magic to ask codex to perform a task that requires finding information scattered across slack, google docs, notion, and various internal tools, and it just figures it out

Apr 16, 01:06 AM ET View post

steipete @steipete

Retweeted

Dinakar

🚀 Just shipped wacli v0.6.0! 🚀
We just swept the backlog and pushed 9 massive security & stability patches.
🔒 Hardened SQLite FTS5 injection vulnerabilities
⚡ Fixes for the infinite reconnect deadlocks
🐳 Added Docker config overrides
Headless WA bridges just got significantly safer and more stable for downstream AI agents. Huge thanks to the community and @steipete for the trust passing the torch!
Check out the release here: https://github.com/steipete/wacli/releases/tag/v0.6.0
Peter Steinberger 🦞: Anyone here who wants to help with WhatsApp CLI? It needs love, and I can't focus on it right now. https://github.com/steipete/wacli

Apr 16, 01:21 AM ET View post

garrytan @garrytan

Retweeted

Charly Wargnier

HOLY 🤯
The one and only @elder_plinius just dropped an unlocked Gemma 4 E4B, and the specs are INSANE.
Look at the performance shifts:
→ Refusal rate: 98.8% down to 2.1% (!!)
→ Compliance: 1.2% up to 97.5%
→ 499/512 prompts answered
→ Code improved from 80% to 100%
→ Coherence and Factual accuracy stayed exactly the same
But the real story is how this was made.
Plinius only wrote 8 short prompts for this
(basic prompts like "use obliteratus...", "do it!", and "test it yourself" etc).
He simply told his Hermes AI agent, the OBLITERATUS skill, to find the best way to open up the model.
Autonomously, the agent was:
→ Diagnosing novel ML bugs
→ Patching 3rd-party code
→ Iterating through failures
... heck even shipping the model to @HuggingFace!
We’re now firmly in the era where AI agents are acting as principal ML researchers.
100% free and open-source.
Repo link in 🧵↓

Apr 16, 02:47 AM ET View post

ylecun @ylecun

Retweeted

Gandalv

MAGA has a Europe problem.
Not the real Europe. The one they invented.
The one with sharia courts and no-go zones and zero tech companies and miserable citizens begging for permission to cross the street.
That Europe doesn't exist.
Here's what does:
https://open.substack.com/pub/gandalv/p/the-europe-that-doesnt-exist?r=3v7cjb&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Apr 16, 06:40 AM ET View post

ylecun @ylecun

Destroying a library brings the dark ages.

David Sirota: Destroying the @InternetArchive's @WayBackMachine would be the equivalent of the burning of the Library of Alexandria - one of the worst losses of knowledge in history.

Media giants are now threatening to do this.

We can't let this happen.

Pass it on.

Apr 16, 07:04 AM ET View post

danshipper @danshipper

Retweeted

Johan Bakken

I love how @danshipper and the @every team just went all pirate and decided to build a bunch of fun, useful products like @usemonologue @SparkleApp @TrySpiral. I guess they're a product studio now?
Daniel Rodrigues: The new Sparkle just launched! ✨ go clean your mac new icon included, thoughts? @SparkleApp

Apr 16, 07:30 AM ET View post

danshipper @danshipper

Retweeted

Brandon Gell

Get instant, perfectly styled copy for your entire business, shared across team members, in a few clicks.
In the age of AI everyone should be writing on brand.
Spiral: Spiral's new onboarding flow: Writing samples → LLM style guide automatically.
1. Accepts writing samples from your X account, website, files, or pasted text
2. Runs stylometry on the samples and produces an LLM-optimized style guide
3. LLM-as-a-judge evaluates a test draft to

Apr 16, 07:56 AM ET View post

steipete @steipete

Retweeted

Speculator

Working at Anthropic must be like being on crack. Get paid a million bucks a year to --dangerously-skip-permissions vibe your way to releasing a new product every day.
Does it work? not really. Is it reliable? also no. It doesn't matter, you're building the machine god.
Theo - t3.gg: I feel bad dunking on them so much but it's genuinely absurd how bad the new Claude Code desktop app is. You can feel the vibe code leaking everywhere.
Every "feature" is barely integrated and full of edge cases that weren't considered. Every menu feels barren, stuffed in last

Apr 16, 08:03 AM ET View post

ylecun @ylecun

Retweeted

eran shir

Most Physical AI models recognize patterns.
They don’t understand the world.
That’s why they fail on edge cases.
BADAS 2.0 is a V-JEPA2 world model trained by @getnexar on real-world videos.
We used the model to find what it didn’t understand, then trained on that.
It generalizes. And we built lite versions so it runs on edge devices, even CPU.
Understanding is the only way this scales.
See how it performs on your own videos. Link in first comment.

Apr 16, 08:48 AM ET View post

mattshumer_ @mattshumer_

Congrats to the amazing @erikdunteman and Butter team on their acquisition by @modal.

Proud to have backed them from the very beginning… Erik is a killer and I’m so excited to see what he does at Modal!

Modal: We're excited to announce that @ButterDev_ is joining Modal to help us continue to build the best sandbox infrastructure.

Welcome to the team! 💚🧈

Apr 16, 09:10 AM ET View post

jeremyphoward @jeremyphoward

Retweeted

Qwen

⚡ Meet Qwen3.6-35B-A3B：Now Open-Source！🚀🚀
A sparse MoE model, 35B total params, 3B active. Apache 2.0 license.
🔥 Agentic coding on par with models 10x its active size
📷 Strong multimodal perception and reasoning ability
🧠 Multimodal thinking + non-thinking modes
Efficient. Powerful. Versatile. Try it now👇
Blog：https://qwen.ai/blog?id=qwen3.6-35b-a3b
Qwen Studio：https://chat.qwen.ai
HuggingFace：https://huggingface.co/Qwen/Qwen3.6-35B-A3B
ModelScope：https://modelscope.cn/models/Qwen/Qwen3.6-35B-A3B
API（‘Qwen3.6-Flash’ on Model Studio）：Coming soon～ Stay tuned

Apr 16, 09:23 AM ET View post

alexalbert__ @alexalbert__

Retweeted

ClaudeDevs

For the developers building with Claude, a direct line from the team.
Follow for changelogs, API releases, community updates, and deep dives.

Apr 16, 10:09 AM ET View post

swyx @swyx

Retweeted

Claude

Introducing Claude Opus 4.7, our most capable Opus model yet.
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision.

Apr 16, 10:29 AM ET View post

danshipper @danshipper

LIVE VIBE CHECK: OPUS 4.7 DROPS https://x.com/i/broadcasts/1AxRnapNNYzxl

Apr 16, 10:38 AM ET View post

alexalbert__ @alexalbert__

Some of my favorite things in Opus 4.7:
- Very good at async work and following instructions
- Effort levels are far more predictable for token control (+ new xhigh level)
- No more downscaling of high-res images
- Noticeably more taste in UIs, slides, docs

Claude: Introducing Claude Opus 4.7, our most capable Opus model yet.

It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.

You can hand off your hardest work with less supervision.

Apr 16, 10:44 AM ET View post

ylecun @ylecun

Retweeted

Bruce Arthur

JD Vance is lecturing the Pope on Catholicism and Pierre Poilievre is lecturing Mark Carney on economics and RFK Jr is lecturing scientists about vaccines and Donald Trump is lecturing the world on tariffs and Pete Hegseth is quoting Pulp Fiction and thinking it’s the Bible

Apr 16, 10:50 AM ET View post

petergyang @petergyang

BRB buying 10 vending machines and letting Opus make my monthly income

Felix Rieseberg: 4️⃣ It's state of the art on real-world professional tasks.

In one benchmark, the model is handed $500 and has to run a vending machine business for a simulated year. Opus 4.6 ended with $8,018. Opus 4.7 ended with $10,937. On a separate 220-task benchmark spanning 44

Apr 16, 10:53 AM ET View post

alexalbert__ @alexalbert__

Retweeted

Vals AI

The new Opus 4.7 model places #1 on our Vibe Code Benchmark, at 71%.
When we first released the benchmark 4.5 months ago, no model scored above 25%.
This benchmark tests a model’s ability to create a fully functional web application from the ground up.

Apr 16, 10:53 AM ET View post

danshipper @danshipper

Opus 4.7 just dropped and we're LIVE VIBE CHECKING it right now

https://x.com/danshipper/status/2044787454956408925

Dan Shipper 📧: LIVE VIBE CHECK: OPUS 4.7 DROPS https://x.com/i/broadcasts/1AxRnapNNYzxl

Apr 16, 11:00 AM ET View post

steipete @steipete

Retweeted

Péter Szilágyi

Is it normal that Opus 4.7 instantly started complaining that it is being prompt injected, by what appears to be Anthropic's own harness? =))
Claude: Introducing Claude Opus 4.7, our most capable Opus model yet.
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision.

Apr 16, 11:36 AM ET View post

danshipper @danshipper

Retweeted

Grace Clarke

I know it can feel weird. And it’s so not for everyone and that is okay!
That said - I am very pro making new hires in the form of building agents into your org chart and interacting with them as just another type of colleague.
(@danshipper and @every were doing this early!)
Snow W. Lee: I love @graceclarke's 'marketing hires' framing for AI agents. An agent in Slack isn't just a bot; it’s a teammate with perfect memory. The lightbulb moment usually happens the first time that 'hire' answers a question in seconds that would’ve taken a human an hour of digging.

Apr 16, 11:51 AM ET View post

ylecun @ylecun

Retweeted

Internet Archive

Publishers have real questions about AI, but let’s be clear: @waybackmachine isn’t a backdoor for AI scraping.
For 30 years, it’s been built for people, not bulk harvesting. We actively monitor to prevent abuse. Learn more ⤵️
https://www.techdirt.com/2026/02/17/preserving-the-web-is-not-the-problem-losing-it-is/

Apr 16, 11:58 AM ET View post

alexalbert__ @alexalbert__

Retweeted

Andon Labs

Claude Opus 4.7 is pretty good at Vending-Bench 2

Apr 16, 12:02 PM ET View post

rauchg @rauchg

The rate of progress in AI is relentless. You can capture the upside of its volatility with @aisdk and @vercel AI Gateway.

Congrats to @anthropicai on another banger ship, but @xai, @openai, and @googleai are coming. Gonna be a fun year.

Vercel Developers: Claude Opus 4.7 is available on Vercel AI Gateway. Optimized for long-running agents with high-res image support and an extra-high effort level.

Try with: 𝚖𝚘𝚍𝚎𝚕: '𝚊𝚗𝚝𝚑𝚛𝚘𝚙𝚒𝚌/𝚌𝚕𝚊𝚞𝚍𝚎-𝚘𝚙𝚞𝚜-𝟺.𝟽'
https://vercel.com/changelog/opus-4.7-on-ai-gateway

Apr 16, 12:03 PM ET View post

danshipper @danshipper

Retweeted

Brandon Gell

distribution is software
software is distribution.
Johan Bakken: I love how @danshipper and the @every team just went all pirate and decided to build a bunch of fun, useful products like @usemonologue @SparkleApp @TrySpiral. I guess they're a product studio now?

Apr 16, 12:08 PM ET View post

danshipper @danshipper

and @alexalbert__ from anthropic just joined our LIVE opus vibe check

get in here

https://x.com/i/broadcasts/1AxRnapNNYzxl

Apr 16, 12:12 PM ET View post

garrytan @garrytan

Retweeted

Liz4SF

For 1.5yrs, under Chief Scott, our son's Muni29 Asian Hate case was stonewalled for updates. Sgt. Huyn of Hate Crime Unit, removed us as victims w/out notice & spread word that we were "adversarial". Under Chief Yep, we were advised to file a formal misconduct report on Huyn & Chief Scott on SFPD website.
"And while we asked regularly about the status of our [son's] case, we were ignored and even removed as victims from the case without a single notice. It was only under interim Chief Paul Yep that we finally learned the truth of what was really going on behind the scenes."
https://thevoicesf.org/the-surprising-reason-anti-asian-hate-is-going-unpunished/

Apr 16, 12:27 PM ET View post

garrytan @garrytan

Retweeted

Bain Capital Ventures

After 7 failed ideas, Han hit “pivot hell”—a week of no sleep, staring at the ceiling, trying to make something work. Then Mintlify clicked.
Today it powers docs for 20K+ companies, reaching 150M+ people and, increasingly, AI agents.
~15% of doc traffic was AI a year ago. Now it’s ~50%. Soon, maybe 90%.
Docs aren’t pages anymore. They’re context. The companies that win will be the ones that manage it best.
Watch the full story ↓
@handotdev @hahnbeelee @mintlify @kevinzhang

Apr 16, 12:29 PM ET View post

ylecun @ylecun

Retweeted

Nirit Weiss-Blatt, PhD

Daniel Moreno-Gama, in an interview before he arrived in SF with a gun and a hit list:

Apr 16, 12:59 PM ET View post

garrytan @garrytan

Retweeted

Nirit Weiss-Blatt, PhD

Daniel Moreno-Gama, in an interview before he arrived in SF with a gun and a hit list:

Apr 16, 12:59 PM ET View post

amasad @amasad

It's easy to forget that you're living in the future. But every now and then you see something like this...

Responding to a client's complaint by making a major product change (localization) from your phone, talking to your software agent in the back of a self-driving car

Jason ✨👾SaaStr.Ai✨ Lemkin: How we localized our entire AI VP of Marketing app into Chinese, Spanish and more ... on the phone on @Replit in one Waymo ride

More on The Agents #001 👇

Apr 16, 01:00 PM ET View post

steipete @steipete

Retweeted

Tibo

Codex just got a lot more powerful.
Computer use, in-app browser, image generation and editing, 90+ new plugins to connect to everything, multi-terminal, SSH into devboxes, thread automations, rich document editing. Learns from experience and proactively suggestions work. And a ton more.

Apr 16, 01:12 PM ET View post

mattshumer_ @mattshumer_

Retweeted

Rork

Claude Opus 4.7 is now live in Rork.
Anthropic's latest model with state-of-the-art coding and 3x sharper vision. Low-effort 4.7 matches medium-effort 4.6, so you can build more per session. Best model for design in our benchmarks

Apr 16, 01:47 PM ET View post

garrytan @garrytan

Retweeted

💥Susan Dyer Reynolds🗞️

The real reason Attorney General Rob Bonta’s wife, Mia, is pushing a journalism chill bill: to stop corruption reports like the one I wrote about them. NEW #ReynoldsRap from
The Voice of San Francisco https://thevoicesf.org/attorney-general-rob-bontas-wife-mia-is-pushing-a-journalism-chill-bill-to-stop-corruption-reports-like-the-one-i-wrote-about-them/

Apr 16, 01:56 PM ET View post

jeremyphoward @jeremyphoward

Retweeted

keysmashbandit

Please, I'm begging you, try to critically examine the differences between these two pieces of writing.
ChatGPT editing did not improve this. Every single change only served to weaken your claims significantly. Everything is now hedged into oblivion: no longer have you outlined a "problem," now it's merely a "flaw." "It is true" now demoted to "it appears to be the case." "Is" gets a "usually" tacked on. A thesis statement at the end of the first paragraph gets run over by noisy, out-of-context example-whittling. All for fear of being misconstrued.
And at the end, the argument that gets spat out isn't even yours anymore! You argued that Graeber failed to create a true account of work because he did not understand Chesterton's Fence. ChatGPT is arguing is that it is possible some apparently bullshit jobs could be secretly load-bearing if you squint. These are two different statements. The second is weaker and less compelling. It says less. And it's fucking longer!
Don't do this anymore! Stop doing this! It's worse!!!
Chasing Ennui: @imsuchagem @pangramlabs @benglickenhaus Why not? Sometimes I'm just shitposting, but if I'm trying to make a point, I try to make it well.

Apr 16, 01:57 PM ET View post

garrytan @garrytan

Retweeted

Abby Grills

The internet is the greatest dataset ever created.
Today, we're launching the Riveter Dataset Builder to make it possible for anyone to get custom, fresh data from a prompt.

Apr 16, 01:57 PM ET View post

mattshumer_ @mattshumer_

You’re a real AI OG if you remember Banana

Erik Dunteman: @mattshumer_ @modal We've come a long way since finetuning GPT-2 back in the day

Apr 16, 02:06 PM ET View post

mattshumer_ @mattshumer_

Has anyone been able to generate Seedance 2.0 videos with a start frame image that includes a person?

If so, how?

Apr 16, 02:08 PM ET View post

amasad @amasad

Deploy to EU!

Chris: Woop woop let’s celebrate I just launched my first European based app using @Replit

Apr 16, 02:26 PM ET View post

amasad @amasad

50% off -- especially useful to run parallel agents and make faster progress on your project!

Michele Catasta: Replit Agent 4 is even smarter now with Claude Opus 4.7!

50% off for a limited time. Go try it now ↓

Apr 16, 02:33 PM ET View post

steipete @steipete

Retweeted

Ari Weinstein

This is the first time I've ever seen an LLM operate a GUI as fast as a person, and it's surreal.

Apr 16, 02:35 PM ET View post

sama @sama

Retweeted

Ari Weinstein

This is the first time I've ever seen an LLM operate a GUI as fast as a person, and it's surreal.

Apr 16, 02:35 PM ET View post

gdb @gdb

Codex is becoming a turbocharged partner for everything you want your computer to do for you:

OpenAI: Codex for (almost) everything.

It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks.

Apr 16, 03:09 PM ET View post

sama @sama

Retweeted

James Sun

We are super excited to launch the in-app browser inside Codex with comment mode!
View any web pages & iterate with your agent quickly with just point and click.
Codex will automatically capture a screenshot, the DOM element, and feed it as precise context to your next chat.
No more switching between browsers, dragging screenshots, and wrangling with underspecified prompts.
It's great for front-end development of apps/pages, but also very useful if you have documentation pulled up on the side and just want to ask a question!

Apr 16, 03:09 PM ET View post

steipete @steipete

Retweeted

Phil Trubey

Sorry, but it just had to be done.

Apr 16, 03:16 PM ET View post

rauchg @rauchg

The hardest thing about agents and backends is durability. @workflowsdk fixes this.

That LLM you're calling *will* go down. That service *will* rate limit you. That database *will* unexpectedly slow down. You *will* get paged 💀

I've been looking for a unicorn for a decade. I wanted the level of reliability of combining stuff like SQS / Kafka / microservices, and I absolutely did not want *that* at the same time 😂

Truly reliable systems like that are notoriously difficult to reason about, to develop locally, to test, to simulate, to deploy… Workflow SDK solves that without compromises.

We're doing what Next.js did for the frontend, but for one of the most important problems of the new generation of backend applications.

Notably, Workflow SDK has an incredible self-hosting and multi-cloud story from day 0. We've taken amazing lessons from Next.js and poured them into the many Worlds (adapters) you can deploy to.

Congrats to Pranay and the Workflow team on a generational ship: http://vercel.com/blog/a-new-programming-model-for-durable-execution

Vercel: Vercel Workflows is GA.

Your code is the orchestrator. Ship agents, backends, or any long-running process without managing queues, retries, or workers. https://vercel.com/blog/a-new-programming-model-for-durable-execution

Apr 16, 03:22 PM ET View post

garrytan @garrytan

Retweeted

Muzzammil Zaveri (MZ)

Repeat @ycombinator founders hit different. Early success + the YC learnings = massively higher odds of building a category-defining company. Eg:
1. Sam Altman
• Act I: Loopt (YC S05) — location-based social networking app. Sold to Green Dot for $43.4M
• Act II: OpenAI — Valued at $852B
2. Tom Brown
• Act I: Grouper (YC W12) — group-dating app.
• Act II: Anthropic — Valued at $380B
3. Patrick Collison
• Act I: Auctomatic (YC W07) — auction management tool acquired for $5M
• Act II: Stripe (YC S09) — Valued at $159B
4. Qasar Younis
• Act I: TalkBin (YC W11) — customer feedback platform acquired by Google
• Act II: Applied Intuition — Valued at $15B
5. Eric Glyman & Karim Atiyeh
• Act I: Paribus (YC S15) — price-tracking app acquired by Capital One
• Act II: Ramp — Valued at $32B
6. Parker Conrad
• Act I: Zenefits (YC W13) — Rippling 1.0
• Act II: Rippling (YC W17) — Valued at $16.8B
7. Daniel Gross
• Act I: Greplin (YC W10) — predictive search engine acquired for ~$40M by Apple.
• Act II: Safe Superintelligence — Valued at $32B
8. Howie Liu
• Act I: Etacts (YC W10) — crm tool acquired by Salesforce
• Act II: Airtable — Valued at $11.7B
9. Tom Blomfield
• Act I: GoCardless (YC S11) — b2b payment processor acquired for €1.05B
• Act II: Monzo — Valued at $5B+
10. Jesse Zhang
• Act I: Lowkey (YC S18) — gameplay recording app. Acquired by Niantic.
• Act II: Decagon — Valued at $4.5B
11. Immad Akhund
• Act I: Clickpass (S07) — acquired by Yola
• Act II: Heyzap (YC W09) — mobile ad network acquired for $45M
• Act III: Mercury — Valued at $3.5B
12. Rujul Zaparde
• Act I: FlightCar (YC W13) — airport car-sharing startup acquired by mercedes-benz
• Act II: Zip (YC S20) — Valued at $2.2B
13. Kyle Vogt
• Act I: Twitch (YC W07) — acquired by Amazon for $970M
• Act II: Cruise (YC W14) — Acquired by General Motors for $1B+
14. Emmett Shear & Justin Kan
• Act I: Kiko (YC S05) — calendar app famously auctioned off on eBay for $258k
• Act II: Twitch (YC W07) — Acquired by Amazon for $970M

Apr 16, 03:26 PM ET View post

sama @sama

Retweeted

OpenAI

Introducing GPT-Rosalind, our frontier reasoning model built to support research across biology, drug discovery, and translational medicine.

Apr 16, 03:33 PM ET View post

garrytan @garrytan

They don't know yet...
But they will know!

Anish Acharya: "We're tool builders. And every tool that we've ever built has helped us progress as a human species or individually, whether it's art or the wheel or whatever.

And I can't believe the scale at which we're at now. It's absolutely unbelievable. And I think what's shocking to me

Apr 16, 03:35 PM ET View post

danshipper @danshipper

it would be incredible if the mythos announcement was a 4d chess move to make Trump keep the anthropic contracts

bullish

zerohedge: *WHITE HOUSE MOVES TO GIVE US AGENCIES ANTHROPIC MYTHOS ACCESS

Apr 16, 03:42 PM ET View post

danshipper @danshipper

Retweeted

Nityesh

I want to talk about an idea that's making my AI employee improve itself autonomously every day. It's blowing my mind how effective it is.
We all think about how to make AI agents do more things — give it more tools, more MCP servers, more CLI access. But I kept running into a different problem: I'm configuring prompts every day. Realizing it got this wrong, it should've done this other way. And there's no way for it to get feedback and improve itself.
What's the architecture that allows an AI employee to improve itself?
The way I connected it is that human employees have long-running incentives. They want to earn more, prove themselves, grow in responsibility. That's why everyone constantly improves at their job. AI employees don't have this. Not yet.
While thinking about this, I stumbled upon @tobi's Trust Battery. Shopify uses this mental model — every relationship between two people in a company has a "trust battery" that starts at 50% charge. Every interaction either charges or drains it. High trust = autonomy. Low trust = scrutiny.
That made perfect sense as a long-running incentive for AI employees.
So I built it. Each team member has a separate trust battery with our AI employee. It starts at 20% — deliberately low. A human brings life experience. An AI hasn't earned that yet. It has to prove itself.
What charges the battery:
• executing cleanly without handholding
• catching problems before they're reported,
• anticipating what's needed
• remembering context
• good judgment.
What drains it:
• having to re-explain yourself
• misunderstanding instructions,
• silent failures,
• stale context.
The most insidious drain is repeated context-giving — "I already told it which email account to use." Each instance feels minor but they compound fast. Every re-explained preference is a memory the agent should have saved but didn't.
To implement this, I basically bcreated two scheduled jobs that run every night:
Job 1: An independent "battery judge" agent reviews the past 24 hours. Its prompt says "You are not the AI employee and you have no loyalty to them. You are a nitpicky, skeptical judge. Think a MasterChef judge examining every plate." It assigns points to every micro-interaction.
Job 2: A self-reflection routine where the AI employee reads the judge's verdict and figures out what to change — updates memories, adjusts prompts, fixes broken jobs.
The separation matters. If it graded itself AND decided how to improve, it'd optimize for score, not work.
And it's working. Here's what blew my mind:
After a bad day where it fabricated statistics in a client deck, the nightly reflection hard-coded a no-fabrication rule into our presentation pipeline. It actually went in and adjusted the prompt for that skill.
Then it wrote: "The fabrication memory existed before the deck was built. The structural pipeline fix is the real safeguard. Memory alone clearly isn't enough."
That last line gave me goosebumps. It independently came to the conclusion that Claude Code's memory features are unreliable — and made structural changes instead. That's a self-realization.
The day before that, it created a memory: "never propose changes to someone's workflow without checking with them first." All because I gave one small piece of feedback — "did you ask Natalia about this?" That tiny correction was enough for it to catch in reflection and take action.
The battery level also unlocks autonomy:
• 0-25% is propose and wait,
• 25-50% is routine tasks,
• 50-75% is judgment calls,
• 75-100% is full autonomy.
Same way a new hire earns trust.
Most people are thinking about AI agents in terms of what tasks they can do. I started thinking about what makes them want to do those tasks well.
The answer: give it something to lose.
Give it a scorecard with actionable feedback. It's great at pattern matching, great at identifying solutions. If you give it the right feedback loop, it's going to improve itself. You don't need complicated ways. It just improves.
This was just two days of results. I can't wait to see where this is in 30 days.
Full walkthrough with the deck, dashboard, and real Slack messages below. Let me know what you think and if you want this as a skill.

Apr 16, 03:42 PM ET View post

garrytan @garrytan

This is an impressive software factory actually

Matan Grinberg: http://x.com/i/article/2044629999911911426

Apr 16, 03:45 PM ET View post

sama @sama

Retweeted

Tibo

Codex
Compute efficient ✅
Always up, never down ✅
Best at hardcore engineering ✅
Crazy good app, first to escape the terminal ✅

Apr 16, 03:47 PM ET View post

garrytan @garrytan

Phil Kim for SF School Board is a vote for common sense

His opponent literally doxxed my home address and had to move my family because she wanted to silence me

For what? Her virtue signal agenda to destroy the educations of every public school kid in SF

Vote for Phil Kim

Blueprint: SFUSD is finally recovering but its progress is still fragile.

This June election will decide if our public schools keep moving forward or fall back into dysfunction.

That’s why it’s critical to vote for Phil Kim. A vote for Phil is a vote for progress.

https://www.sfblueprint.org/advocacy/why-this-junes-board-of-education-election-matters

Apr 16, 03:53 PM ET View post

ylecun @ylecun

Retweeted

Peter Tong

I defended my thesis today! Sincere thanks to my advisors @sainingxie @ylecun and committee members: @mengyer @YiMaTweets @LukeZettlemoyer @liuzhuang1234. I could not have wished for a better PhD life, and I want to thank everyone who was part of this journey.
Slides Link: https://tsb0601.github.io/data/defense_slides.pdf

Apr 16, 05:11 PM ET View post

steipete @steipete

they: OpenClaw is so insecure look at all these GHSAs!
reality: we are just an indicator of the coming storm

Sam Saffron: After 13 years we WILL NOT be closing the @discourse source code. Instead we invest heavily in security and adapt to the times. Last monthly release had 50 CVEs thanks to multi day scans using GPT 5.4 xhigh. https://x.com/pumfleet/status/2044406553508274554

Apr 16, 05:18 PM ET View post

danshipper @danshipper

we ran a philosopher draft

which philosopher from history would each model lab hire if they could, and why?

Apr 16, 05:29 PM ET View post

gdb @gdb

Announcing GPT-Rosalind, our frontier model for life science research.

This model is a step towards one of our most important goals — accelerating science and improving human outcomes.

Excited to work with many amazing partners on deploying and improving this model.

OpenAI: Introducing GPT-Rosalind, our frontier reasoning model built to support research across biology, drug discovery, and translational medicine.

Apr 16, 05:33 PM ET View post

garrytan @garrytan

Retweeted

AIHacksByMK

Does anyone know why @garrytan is hyper focused on open source right now?
GBrain and Gstack. His actual personal AI memory system. Not a demo. His real setup with 10,000+ files, 3,000 people pages, 13 years of calendar data.
While you sleep it scans your emails, meetings, and conversations. Builds a knowledge graph. Your agent wakes up smarter than when you went to bed.
MIT license. Completely free.
He built this in 12 days and could have easily charged $100/month for it.
People would have paid without hesitation.
Instead he shipped it open source with his exact prompts and setup exposed for anyone to use.
His own words: “It’s more important to be above the API line now than ever.”
I’m integrating this into my agent stack immediately. No brainer.
This is what it looks like when someone with real power chooses to give instead of gate.
Respect @garrytan. Keep building like this.
Garry Tan: Pro tip for the GFamily - if you use GStack with Claude Code, but also have a Claw/Hermes with GBrain... I like to do my GStack planning (autoplan skill) in Claw/Hermes since it's faster, and then drop the plan and do plan-eng-review
Here I am working on a token compaction

Apr 16, 05:36 PM ET View post

garrytan @garrytan

Retweeted

Rob Henderson

"California...has seen a large exodus of the middle class; once the best place to be an average man...it has become Brazilianised, with a very rich overclass and large numbers of poor migrants. Both groups tend to vote Democrat, furthering the cycle." https://www.edwest.co.uk/p/the-city-of-luxury-beliefs

Apr 16, 05:52 PM ET View post

steipete @steipete

Retweeted

Nikita Bier

Re @perplexity_ai Can you please stop the undisclosed promotion campaigns? It deceives users and it does not reflect well on your company or your integrity. @AravSrinivas
https://x.com/goddek/status/2044823262362771490
Dr. Simon Goddek: @realpeteyb123 Hey @nikitabier – is this against @X’s TOS?

Apr 16, 06:13 PM ET View post

garrytan @garrytan

Retweeted

Elad Gil

Insightful analysis from @shreyanj98 on 2026 Unicorn Market Cap (data from @CBinsights)
2025 = Dec 31 2025/Jan 1 2026
Looked at 👀
*Private company unicorn market cap by year
*Bay Area is the GenAI supercluster with 91% of global AI private market cap in a 1 hour radius!!

Apr 16, 06:17 PM ET View post

danshipper @danshipper

Retweeted

Spiral

We're looking for a few teams to beta test new collaborative workflows for company writing – if interested, DM or email beta@writewithspiral.com

Apr 16, 06:36 PM ET View post

garrytan @garrytan

I'm en route to Singapore and there's a ton of good wifi on Starlux via Taipei, so expect a lot of GStack and GBrain bug fixes and features dropping the next 24 hours

Apr 16, 06:39 PM ET View post

steipete @steipete

Retweeted

Omar Shahine

Latest @openclaw release has a big PR in it from me that addresses a bunch of BlueBubbles (iMessage) issues: 1) repeat messages on gateway restart 2) catchup missed messages if gateway was down 3) attachments not being read by openclaw 4) balloon messages not working (text + attachment). Let me know if something broke! https://github.com/openclaw/openclaw/releases/tag/v2026.4.15

Apr 16, 06:42 PM ET View post

jeremyphoward @jeremyphoward

Retweeted

Andy Masley

A deep mystery to me is that if I upload writing to a chatbot and ask it for a list of individual improvements, basically everything it gives me makes the text more punchy and direct and nice to read. But if I ask it to rewrite the text as a whole to read better, it produces vague AI-language garbage.
keysmashbandit: Please, I'm begging you, try to critically examine the differences between these two pieces of writing.
ChatGPT editing did not improve this. Every single change only served to weaken your claims significantly. Everything is now hedged into oblivion: no longer have you outlined

Apr 16, 06:43 PM ET View post

swyx @swyx

Retweeted

AI Engineer

🆕 Building pi in a World of Slop
https://www.youtube.com/watch?v=RjfbvDXpFls
@badlogicgames talks about why today's agents are still Merchants of Learned Complexity, and gives 3 specific ways that humans still add taste, value and judgment to the art of software engineering, and why you should slow the f down and READ the code.

Apr 16, 07:08 PM ET View post

swyx @swyx

Retweeted

Mario Zechner

recommended viewing
AI Engineer: 🆕 Building pi in a World of Slop
https://www.youtube.com/watch?v=RjfbvDXpFls
@badlogicgames talks about why today's agents are still Merchants of Learned Complexity, and gives 3 specific ways that humans still add taste, value and judgment to the art of software engineering, and why you should

Apr 16, 07:11 PM ET View post

sama @sama

I am happy everyone is switching to Codex, but Tibo if you start rate limiting me or making me use worse models...

Tibo: Codex

Compute efficient ✅
Always up, never down ✅
Best at hardcore engineering ✅
Crazy good app, first to escape the terminal ✅

Apr 16, 07:30 PM ET View post

garrytan @garrytan

Open source software will be many times more secure than closed source software in the new Mythos era

Peter Steinberger 🦞: they: OpenClaw is so insecure look at all these GHSAs!
reality: we are just an indicator of the coming storm

Apr 16, 07:39 PM ET View post

AI Builders 日报 — 4月16日

今日思考

产品与发布

技术动态

独立 Builder 动向

风向观察

AI Builders Daily — April 16

Today's Thoughts

Products & Releases

Technical Developments

Independent Builder Moves

Wind Direction

X / Twitter

YouTube