Granada (Spain).- An international team of scientists found that artificial intelligence (AI) continues to make mistakes when answering questions that require conceptual reasoning and not mere internet searches, which reveals that it still does not equal human thought in complex academic tasks.
As detailed this Thursday in a statement by the University of Granada (UGR), of Spain, the research was born with the aim of determining whether the current large language models (LLM) possess a broad and true capacity for creative resolution or if, on the contrary, their operation is limited to a sophisticated management of the data that these models retrieve from the cloud.To this end, the scientific community designed a battery of questions of high technical and conceptual complexity as part of their research, called 'Humanity’s Last Exam' ('Último Examen de la Humanidad', in Spanish), published in the Nature journal.
Thus, a team made up of 1,100 scientists from all fields of science and the humanities - including the Spanish María Cruz Boscá, from the UGR- subjected the AI to a "grand exam" to assess its response to questions with unequivocal and verifiable answers but which require more than a search on the internet. Each question has a known solution that is unequivocal and verifiable, but cannot be answered quickly and easily through an internet search. As detailed in the article 'A Reference Base of Expert-Level Academic Questions to Evaluate AI Capabilities', the result is that, as of today, even the most advanced AI models stumble upon deep scientific concepts and inherit errors from classic textbooks, which highlights a marked difference between the current capabilities of LLMs and those of human experts in the different academic questions posed, referring to various scientific fields. As explained by Professor Boscá, who tested the limits of artificial reasoning in the field of quantum physics, her research showed that artificial intelligences failed to choose the correct answers when they had to perform a deep conceptual understanding.You may be interested in: Microsoft unveils Maia 200, its new AI chip to compete with Google and Amazon
In one of the questions, related to the Einstein-Podolsky-Rosen paradox, the machine failed due to a classic interpretative bias, assuming an objective reality in the measurement that contradicts quantum principles. In another case, related to the Stern-Gerlach experiment, the AI reproduced a factual error that is repeated in numerous scientific manuals, demonstrating that these systems can perpetuate bibliographic mistakes if they are not trained to discern the correct answer.






