Daily Briefing

2026-03-05

X / Twitter

12
Twitter @Aaron Levie @levie

Agents will be the biggest users of software. They’ll often need their own computers, identities, file systems, and tools to do their work. As a result, software will increasingly become API-first to be as useful to agents as they are to people. This is a huge opportunity. Box: Agents need files to keep track of their work, they leverage files as context about the tasks they’re doing, and use them to share back and forth with their human counterparts. @levie spoke with @CNBC about and the importance of agents having their own filesystems.

View on X →
Twitter @Ryo Lu @ryolu_

Make agents work while you think, while you play, while you sleep. This is Cursor Automations. Cursor: We're introducing Cursor Automations to build always-on agents.

View on X →
Twitter @Kevin Weil 🇺🇸 @kevinweil

RT Derya Unutmaz, MD I’ve had early access to GPT-5.4 Pro. Without any reservation, I can say it is the most intelligent AI model to date, even significantly surpassing GPT-5.2 Pro at several levels! I’ve been using it non-stop past several days and am super excited about another major jump in AI! I will share specific examples, but overall GPT-5.4 Pro demonstrates relatively higher creativity, insight, and abstract intelligence. It tends to ask “why,” “what if,” “can I,” and “why it matters” type questions more frequently than the 5.2 Pro model. It also appears to generalize more effectively and comes across as more AGI-like in its reasoning, and even displays human-like intuition! Especially biomedical science-based responses are unifying large data sets and simply amazing!

View on X →
Twitter @Sam Altman @sama

RT Noam Brown GPT-5.4 is a big step up in computer use and economically valuable tasks (e.g., GDPval). We see no wall, and expect AI capabilities to continue to increase dramatically this year. OpenAI: GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

View on X →
Twitter @Kevin Weil 🇺🇸 @kevinweil

💥 GPT 5.4 is launching today! It's our best model ever, and it's also the most capable scientific model we've ever released. GPT 5.4 Pro in particular is 🤯 based on early testing with scientists and mathematicians.

View on X →
Twitter @Sam Altman @sama

GPT-5.4 is launching, available now in the API and Codex and rolling out over the course of the day in ChatGPT. It's much better at knowledge work and web search, and it has native computer use capabilities. You can steer it mid-response, and it supports 1m tokens of context.

View on X →
Twitter @Sam Altman @sama

Codex app on Windows! Andrew Ambrosino: The Codex app is now live on Windows. The app runs both natively and in WSL, with integrated terminals for PowerShell, Command Prompt, Git Bash, or WSL. We also built the first Windows-native agent sandbox — using OS-level controls to block filesystem writes outside your

View on X →
Twitter @Sam Altman @sama

Forgot to mention /fast! I think people will like this. Ahmed: Today we are introducing GPT-5.4 in codex. It's more token efficient and better at tool calling, computer use, and frontend development. We are also introducing /fast to get a faster version of Codex. Enjoy ❤️

View on X →
Twitter @Kevin Weil 🇺🇸 @kevinweil

RT Epoch AI GPT-5.4 set a new record on FrontierMath, our benchmark of extremely challenging math problems! We had pre-release access to evaluate the model. On Tiers 1–3, GPT-5.4 Pro scored 50%. On Tier 4 it scored 38%. See thread for commentary and additional experiments.

View on X →
Twitter @Sam Altman @sama

We will be able to fix these three things! Matt Shumer: I've been testing GPT-5.4 for the last week. In short, it is the best model in the world, by far. It's so good that it's the first model that makes the “which model should I use?” conversation feel almost over. The biggest surprise: I barely use Pro anymore! If you know me,

View on X →
Twitter @Aaron Levie @levie

Model progress continues unabated, with GPT-5.4 showing significant improvements in critical knowledge worker tasks. In our Box AI tests, we saw a 6 point jump in agentic document processing, which is upstream from most automation workflows. GPT-5.4 is now available on Box. Box: We tested @OpenAI's new GPT-5.4 model, and it showed a 78% overall extraction accuracy - up 6 points from GPT-5.2. Our evaluation tested the model across real industries and document workflows: → Clinical data: +5 pts (81% → 86%) → Legal agreements: +3 pts (82% → 85%) →

View on X →
Twitter @Andrej Karpathy @karpathy

There was a nice time where researchers talked about various ideas quite openly on twitter. (before they disappeared into the gold mines :)). My guess is that you can get quite far even in the current paradigm by introducing a number of memory ops as "tools" and throwing them into the mix in RL. E.g. current compaction and memory implementations are crappy, first, early examples that were somewhat bolted on, but both can be fairly easily generalized and made part of the optimization as just another tool during RL. That said neither of these is fully satisfying because clearly people are capable of some weight-based updates (my personal suspicion - mostly during sleep). So there should be even more room for more exotic approaches for long-term memory that do change the weights, but exactly - the details are not obvious. This is a lot more exciting, but also more into the realm of research outside of the established prod stack. Awni Hannun: I've been thinking a bit about continual learning recently, especially as it relates to long-running agents (and running a few toy experiments with MLX). The status quo of prompt compaction coupled with recursive sub-agents is actually remarkably effective. Seems like we can go

View on X →

YouTube

2