In the race to lead the development of artificial intelligence, the pace has become a succession of linked movements. On November 12th, GPT-5.1 arrived, an update aimed at refining the experience and keeping users satisfied. Just a few days later, on November 18th, Google responded with Gemini 3, an evolution of its star model that left a very good impression among those who started testing it.
Following that launch, rumors began to circulate: the startup led by Sam Altman had activated a supposed "red code" upon seeing how its direct rival was gaining an advantage. And this seems to be the first result of that internal move. Not even a month has passed since the previous update of its flagship model and GPT-5.2 is already here. The promise here is to solve some known problems, decrease latency, and gain in reasoning.
An evolution within the 5 series. GPT-5.2 appears as a version designed to boost knowledge work, with advances in coding, vision, document analysis, and multi-step projects. OpenAI incorporates it as the direct evolution of GPT-5.1, not as a generational leap. According to the company, the update improves the management of long contexts, reduces errors, and increases the ability to coordinate tools.
More differentiated layers of use. The three usual variants are somewhat more differentiated in their use, not by new functions, but by the way they integrate the improvements announced by OpenAI. Thinking absorbs much of the progress in reasoning, handling of extensive documents, and coordination of tools. Pro raises the bar in specialized tasks, especially in code and technical calculations. Instant, for its part, benefits from more stable explanations and a reduction in errors. The result is a clearer separation between everyday tasks, complex work, and expert needs.
A visible improvement in multiple evaluations. OpenAI presents figures showing GPT-5.2 above GPT-5.1 in very different areas, from scientific reasoning to programming and knowledge tasks. In GDPval, the evaluation that measures well-specified jobs in 44 occupations, the model achieves 70.9% wins or draws against human professionals. In GPQA Diamond it rises to 92.4% and in AIME 2025 it reaches 100%. The trend is repeated in technical tests such as FrontierMath or ARC-AGI, where performance also increases compared to the previous version.








