The Economic Times daily newspaper is available online now.

    Smarter, but less accurate? ChatGPT’s hallucination conundrum

    Synopsis

    OpenAI’s latest models, o3 and o4-mini, exhibit higher hallucination rates compared to earlier versions, with o4-mini reaching 48%. OpenAI attributes this to ongoing research, acknowledging the challenge in addressing the issue as larger models tend to worsen hallucinations.

    Smarter, but less accurate? ChatGPT’s hallucination conundrumAgencies
    While artificial intelligence continues to deliver groundbreaking tools that simplify various aspects of human life, the issue of hallucination remains a persistent and growing concern.

    According to IBM, hallucination in AI is “a phenomenon where, in a large language model (LLM)—often a generative AI chatbot or computer vision tool—the system perceives patterns or objects that are non-existent or imperceptible to human observers, creating outputs that are nonsensical or altogether inaccurate.”

    OpenAI’s technical report on its latest models—o3 and o4-mini—reveals that these systems are more prone to hallucinations than earlier versions such as o1, o1-mini, and o3-mini, or even the “non-reasoning” model GPT-4o.

    To evaluate hallucination tendencies, OpenAI used PersonQA, a benchmark designed to assess how accurately models respond to factual, person-related queries.

    “PersonQA is a dataset of questions and publicly available facts that measures the model’s accuracy on attempted answers,” the report notes.

    The findings are significant: the o3 model hallucinated on 33% of PersonQA queries—roughly double the rates recorded by o1 (16%) and o3-mini (14.8%). The o4-mini model performed even worse, hallucinating 48% of the time.

    Despite the results, OpenAI did not offer a definitive explanation for the increase in hallucinations. Instead, it stated that “more research” is needed to understand the anomaly. If larger and more capable reasoning models continue to exhibit increased hallucination rates, the challenge of mitigating such errors may only intensify.

    “Addressing hallucinations across all our models is an ongoing area of research, and we’re continually working to improve their accuracy and reliability,” OpenAI spokesperson Niko Felix told TechCrunch.
    The Economic Times
    '; }); } if(listing) { var h3 = "You tried reading " + listArr.length + " locked stories in the past " + threshold + " days.", h4 = "Become an ET Prime member and don't miss out on these exclusive stories."; var html = '

    ' + h3 + '

    ' + h4 + '

      ' + listing + '
    '; $('#paidSCWidget').html(html); var callback = $('#paidSCWidget').attr('data-callback'); if(callback && window[callback]) { window[callback](); } customDimension.dimension72 += ' | Shown SYFT no Trial_With Missed Paywalled Articles Collection'; } else { _processNormal() } } }); } else { _processNormal(); } } catch (e) { _processNormal(); } }, function () { _processNormal(); }); //console.log('storyCollectionLoad', $); }

    What’s Included with

    PrimeETPrime Membership

    1Exclusive Insights That Matter

    Uncover the truth with our investigative stories

    Uncover the truth with our investigative stories

    Make strategic moves using the real-world case studies

    Make strategic moves using the real-world case studies

    Read industry-specific stories to identify emerging trends

    Read industry-specific stories to identify emerging trends

    Spot opportunities with
in-depth insights that matter

    Spot opportunities with
in-depth insights that matter

    • Trump temper on H-1B visas is forcing Indians to do these things to stay put in US

      Trump temper on H-1B visas is forcing Indians to do these things to stay put in US

      What Adani’s US indictment means for India Inc’s overseas fundraising

      What Adani’s US indictment means for India Inc’s overseas fundraising

    • Why veterans like Reliance, L&T are on acquisition spree? Aswath Damodaran has an answer.

      Why veterans like Reliance, L&T are on acquisition spree? Aswath Damodaran has an answer.

      Will China’s dollar bond sale in Saudi Arabia trump the US in financial world?

      Will China’s dollar bond sale in Saudi Arabia trump the US in financial world?

    • Huawei launches its own OS to compete with Google and Apple. But can it win beyond China?

      Huawei launches its own OS to compete with Google and Apple. But can it win beyond China?

      The problem with lab grown diamonds

      The problem with lab grown diamonds

    • Why a falling rupee is a better option for the economy

      Why a falling rupee is a better option for the economy

      A list of top 20 momentum stocks that have delivered massive returns in one year

      A list of top 20 momentum stocks that have delivered massive returns in one year

      2Invest Wisely With Smart Market Tools & Investment Ideas

      Investment Ideas

      Investment Ideas

      Grow your wealth with stock ideas & sectoral trends.

      Stock Reports Plus

      Stock Reports Plus

      Buy low & sell high with access to Stock Score, Upside potential & more.

      BigBull Porfolio

      BigBull Porfolio

      Get to know where the market bulls are investing to identify the right stocks.

      Stock Analyzer

      Stock Analyzer

      Check the score based on the company's fundamentals, solvency, growth, risk & ownership to decide the right stocks.

      Market Mood

      Market Mood

      Analyze the market sentiments & identify the trend reversal for strategic decisions.

      Stock Talk Live at 9 AM Daily

      Stock Talk Live at 9 AM Daily

      Ask your stock queries & get assured replies by ET appointed, SEBI registered experts.

      3Stay informed anytime, anywhere with ET ePaper

      ePaper - Print View

      ePaper - Print View

      Read the PDF version of ET newspaper. Download & access it offline anytime.

      ePaper - Digital View

      ePaper - Digital View

      Read your daily newspaper in Digital View & get it delivered to your inbox everyday.

      Wealth Edition

      Wealth Edition

      Manage your money efficiently with this weekly money management guide.

      4Times Of India Subscription (1 Year)

      TOI ePaper

      Read the PDF version of TOI newspaper. Download & access it offline anytime.

      Read the PDF version of TOI newspaper. Download & access it offline anytime.

      Deep Explainers

      Explore the In-depth explanation of complex topics for everyday life decisions.

      Explore the In-depth explanation of complex topics for everyday life decisions.

      Health+ Stories

      Get fitter with daily health insights committed to your well-being.

      Get fitter with daily health insights committed to your well-being.

      Personal Finance+ Stories

      Manage your wealth better with in-depth insights & updates on finance.

      Manage your wealth better with in-depth insights & updates on finance.

      New York Times Exclusives

      Stay globally informed  with exclusive story from New York Times.

      Stay globally informed with exclusive story from New York Times.

      5Enjoy Complimentary Subscriptions From Top Brands

      TimesPrime Subscription

      TimesPrime Subscription

      Access 20+ premium subscriptions like Spotify, Youtube & more.

      Docubay Subscription

      Docubay Subscription

      Stream new documentaries from all across the world every day.

      Stories you might be interested in