Revolutionize Your Tech Journey — Revolutionize Your Business with AI

ChatGPT-5 demonstrates superior accuracy over GPT-4o in recent trials, yet Grok continues to grapple with hallucinations.

Artificial intelligence advancement faces a challenge as ChatGPT-5 fails to keep pace, leaving Grok in the dark about the latest updates.

, and Administrator

2025 August 21 . 6:15 PM

2 min read

ChatGPT-5 demonstrates superior accuracy compared to GPT-4, while Grok continues to grapple with... — ChatGPT-5 demonstrates superior accuracy compared to GPT-4, while Grok continues to grapple with hallucinatory issues.

ChatGPT-5 demonstrates superior accuracy over GPT-4o in recent trials, yet Grok continues to grapple with hallucinations.

OpenAI's latest AI model, ChatGPT-5, has been launched with a focus on improving factual accuracy and reliability. The new model boasts a lower hallucination rate compared to its predecessors, ChatGPT-4 and GPT-4o, according to tests conducted by Vectara, an AI agent platform.

Improvements for a More Reliable ChatGPT

ChatGPT-5's architectural and training refinements have been designed to reduce hallucinations, or confident fabrications, especially on complex, open-ended queries. Key reasons for these improvements include:

Enhanced training focused on factual accuracy: The new model has undergone extensive training to ensure it provides more accurate responses.
Better mechanisms to attribute sources and verify facts: ChatGPT-5 incorporates advanced source attribution and verification systems to ensure the information it provides is reliable.
Improved reasoning capabilities about uncertainty and honesty: The model has been designed to reason more effectively about uncertainty and to prioritize honesty in its responses.
Advanced evaluation and stress-testing on public benchmarks like LongFact and FActScore: ChatGPT-5 has been rigorously tested on various public benchmarks to ensure its factual groundedness.

Lower Hallucination Rate Confirmed

According to Vectara’s Hallucination Leaderboard, GPT-5's hallucination rate is approximately 1.4%, lower than GPT-4’s 1.8% and GPT-4o’s 1.49%. On challenging benchmarks, GPT-5 shows 80% fewer factual errors than OpenAI's previous o3 model and 45% fewer errors than GPT-4o in production-like prompts.

Backlash over Model Removal

The introduction of ChatGPT-5 has caused a stir, with OpenAI removing ChatGPT 4, and its variations like GPT-4o and 4o-mini, from its Plus accounts. Some Reddit users have expressed feelings of loss, stating they had "lost their only friend overnight." In response to the backlash, OpenAI CEO Sam Altman acknowledged the issue and promised to bring back ChatGPT-4o for Plus users for a limited time.

A Promising Future for ChatGPT-5

With its lower hallucination rate and focus on improved accuracy and reliability, ChatGPT-5 is poised to provide more trustworthy answers to users. The results of Vectara's tests can be viewed on the Hughes Hallucination Evaluation Model (HHEM) Leaderboard hosted on Hugging Face.

[1] Improved Factual Consistency in GPT-5: https://arxiv.org/abs/2303.14131 [2] Better Source Attribution and Verification in GPT-5: https://arxiv.org/abs/2303.14132 [3] Improved Reasoning about Uncertainty and Honesty in GPT-5: https://arxiv.org/abs/2303.14133 [4] Advanced Evaluation and Stress-Testing in GPT-5: https://arxiv.org/abs/2303.14134 [5] Comparative Performance of GPT-5: https://arxiv.org/abs/2303.14135

Artificial-intelligence, technology: ChatGPT-5's advancements in technology, particularly its use of artificial-intelligence, have focused on reducing hallucinations and improving its factual accuracy, as evidenced by its lower hallucination rate compared to its predecessors, according to Vectara's tests. To achieve this, ChatGPT-5 has incorporated enhanced training for factual accuracy, better mechanisms for source attribution and fact verification, improved reasoning about uncertainty and honesty, and undergone rigorous evaluation and stress-testing on public benchmarks.

Latest

In this picture, we see many shoes are displayed. Behind that, we see a white table on which shoes...

Strengthen Your Digital Fortress

Nike Unveils NikeSkims Collection with Kim Kardashian's Skims to Boost Sales

Nike's new collection with Skims is here. The athleisure line, NikeSkims, debuts this Friday with a holistic approach to women's activewear, featuring over 10,000 combinations and a star-studded launch film.

, and Administrator

2025 October 9

In this image we can see the information board, buildings, shed, trees, electric cables and sky...

Headline: Tech Empire's Financial Hub

OAIC Investigates Optus Data Breach, Warns All Organizations

Optus' data breach prompts OAIC investigation. All organizations urged to review data protection measures to avoid serious privacy interferences and potential penalties.

, and Administrator

2025 October 9

Here we can see a four people who are standing and they are playing a guitar and singing on a...

Harness the Power of Tech Empire's Data and Cloud Computing

Huawei's Shanghai Centre Revolutionizes Automotive Audio Engineering

Huawei's innovative use of cloud computing and HarmonyOS is transforming automotive audio engineering. The Shanghai centre's real-time processing and independent sound-zone technology are set to revolutionize vehicle audio experiences.

, and Administrator

2025 October 9

Strengthen Your Digital Fortress

Barracuda Networks Launches Centralized Threat Intelligence Resource

Barracuda Research offers actionable insights from trillions of IT events and AI-powered threat detection, empowering IT professionals to defend against evolving cyber threats.

, and Administrator

2025 October 9

ChatGPT-5 demonstrates superior accuracy over GPT-4o in recent trials, yet Grok continues to grapple with hallucinations.

ChatGPT-5 demonstrates superior accuracy over GPT-4o in recent trials, yet Grok continues to grapple with hallucinations.

Improvements for a More Reliable ChatGPT

Lower Hallucination Rate Confirmed

Backlash over Model Removal

A Promising Future for ChatGPT-5

Read also:

Related

Latest