Google has launched Gemini 3.5 Flash, the first model in its new family, designed to deliver frontier intelligence at superior speed and for less than half the price of its rivals. According to Google DeepMind, this model produces four times more tokens per second than others, outperforming Gemini 3.1 Pro in key benchmarks such as Terminal-Bench 2.1 and CharXiv Reasoning. It is the first to simultaneously occupy the top-right quadrant of the Artificial Analysis index, combining high intelligence and speed.
How the new generation of models works ⚡
The Gemini 3.5 Flash architecture optimizes parallel processing, reducing latency and increasing performance without sacrificing accuracy. In internal tests, the model shows notable improvements in visual reasoning and execution of complex terminal tasks. Being more efficient, it allows developers to run applications that previously required expensive hardware, lowering the cost per query. Google is betting on democratizing access to high-performance models, directly competing with slower and more expensive solutions on the market.
The AI that responds before you finish asking 🤯
Gemini 3.5 Flash is so fast that it has likely already generated a response before you finish reading this sentence. At this rate, we will soon see models that answer questions we haven't even asked yet. Meanwhile, rivals watch with envy as Google sells intelligence at bargain prices, making paying more for fewer tokens seem almost like old-school fraud.