Why I think 'AI-native' is mostly a story about latency, not intelligence

Most of the AI-native pitch survives or dies on a single number: how long the user is willing to wait. Models will keep getting better and cheaper — that’s the easy part.

The hard part is what the product feels like in the half-second the model is thinking, and the architectural choices that decision quietly forces. A long argument about runtimes, not models, with a detour through what I learned trying to make Planda feel responsive when the model behind it sometimes really, really isn’t.