2025-12-24

Fei-Fei Li @drfeifei

Retweeted

Tiange Xiang

‼️VLMs/MLLMs do NOT yet understand the physical world from videos‼️
In our recent work, we found that even the most advanced AI models still lag behind humans in one key aspect: reasoning about the kinematic properties of objects from videos.
Takeaways:
1. ChatGPT 5.1 leads overall among 21 advanced VLMs, followed by Gemini 2.5 Pro/Flash.
2. Grok 4.1 delivers impressive performance at the lowest API cost.
3. Qwen3-VL is the top-performing open-source model.
Read here: https://quantiphy.stanford.edu/
🧵1/N

Dec 24, 01:14 PM ET View post

drfeifei @drfeifei

Retweeted

Tiange Xiang

Dec 24, 01:14 PM ET View post

X / Twitter

YouTube