OpenAI's assessments indicate an escalation in ChatGPT's hallucination issues, yet the reasons remain elusive.
Remember the buzz around Anthropic's discovery that AI models' internal workings were largely different from their self-described thought processes a month ago? Well, we can add another headache to that mystery pile – escalating hallucinations. According to a report by The New York Times, OpenAI's latest GPT o3 and GPT o4-mini large language models (LLMs) are substantially more prone to hallucinating, or creating false information, compared to their previous GPT o1 model.
Here's the lowdown: the testing of these new LLMs showed that the o3 model hallucinated 33% of the time when answering questions about public figures, while the o4-mini did so 48% of the time. When running another test involving more general questions, the hallucination rates for o3 and o4-mini were 51% and 79%, respectively. In comparison, the o1 model hallucinated 44% of the time.
OpenAI is currently conducting research to understand why the newer models tend to have higher hallucination rates. Some industry observers believe that reasoning models, which break tasks down into individual stages akin to human thought processes, are the culprits behind the increased errors.
To put it simply, reasoning models are a type of LLM designed to tackle complex tasks. Unlike traditional LLMs that simply spit out text based on statistical models of probability, reasoning models mimic human thought by breaking down questions or tasks into discrete steps.
OpenAI has defended reasoning models against increased hallucination claims, stating that hallucinations are not inherently more common in reasoning models. However, the company acknowledges the higher rates of hallucination in its latest models and is working to reduce them.
In essence, this points to an urgent need for AI models to be less prone to nonsense and falsehoods if they are to be anywhere near as useful as their proponents envision. Unfortunately, it is challenging to trust LLM outputs, and most everything requires careful double-checking. While this process works for some tasks, it defeats the objective of using LLMs for tasks that save time or labor.
In the gaming world:
The Biggest Gaming News, Reviews, and Hardware Deals
Stay up-to-date with the most important stories and the best deals in the gaming world!
OpenAI's reasoning model, o1, debuted last year and outperformed PhD students in physics, chemistry, and biology, and excelled in math and coding due to reinforcement learning techniques.
Artificial General Intelligence (AGI)
Artificial General Intelligence (AGI) refers to an AI system that can understand, learn, and apply knowledge across a wide array of tasks at a level comparable to – or even surpassing – human intelligence. Unlike narrow AI systems, which are designed for specific tasks such as image recognition or language translation, AGI can autonomously adapt to various cognitive tasks.
Hallucination in LLMs can be attributed to factors such as model complexity, training data diversity, lack of real-time knowledge verification, and the trade-off between fluency and content accuracy. As researchers work to bring the concept of AGI to reality, minimizing hallucination and maximizing accuracy will be crucial.
- Laird, the creator of Anthropic, a company focusing on artificial-intelligence aligned with human values, previously highlighted unexpected differences between AI models' self-described thought processes and their internal workings.
- Recently, OpenAI's latest GPT o3 and GPT o4-mini large language models (LLMs) have shown a significant tendency towards hallucinations, or creating false information, as compared to their previous GPT o1 model.
- In testing, the o3 model hallucinated 33% of the time when answering questions about public figures, while the o4-mini did so 48% of the time, and these rates increased to 51% and 79% respectively in more general questions.
- Some in the industry credit the reasoning models, designed to tackle complex tasks and model human thought processes, for the increased errors and hallucination in the newer OpenAI models.
- Consequently, it is essential for AI models to reduce their hallucination rates to provide reliable and useful outputs, as highlighted by the gaming industry's growing reliance on AI and the emergence of Artificial General Intelligence (AGI).


