Why generative AI can be very convincing when it is wrong

We often worry about AI being wrong. In practice, the bigger risk is when it is wrong but sounds right. In this article, Marta Dobrowolska, Head of Data Science and Knowledge Management at Incotec, discusses this phenomenon.

Marta Dobrowolska, Head of Data Science and Knowledge Management, Incotec

Marta Dobrowolska, Head of Data Science and Knowledge Management at Incotec

What are AI hallucinations? 


One of the biggest challenges with generative AI (GenAI) is not that it sometimes makes mistakes. It is that those mistakes are often presented in a way that sounds polished, plausible, and confident. These systems are already being used widely for drafting, summarizing, translating, brainstorming, and supporting day-to-day work, which makes this issue increasingly relevant in practice.
That is what people usually mean by hallucinations: moments when an AI system produces an answer that sounds polished and plausible, but is false, unsupported, or simply invented. The European Commission’s Joint Research Centre (JRC) has already highlighted quality issues in generative AI outputs, including hallucinations, bias, and over-reliance, and has stressed the need for safeguards and human oversight.

Why do generative AI systems hallucinate?

Generative AI is designed to create new text, images, code, audio, or video that resemble the data on which it was trained. In technical terms, these models learn patterns and probability distributions in data and then generate new samples from them. That is what makes them so useful and versatile. It is also why they can produce fluent output without any built-in guarantee that a specific claim is correct.
So, when a model gives a wrong answer, it is usually not “inventing” in the human sense. It is doing what it was built to do: generating the output that best fits the patterns it has learned from previous data. If the question is ambiguous, the context is thin, or the model does not have access to the right sources, it may fill the gap with something that sounds right rather than something that is right. Seen that way, a hallucination is less like deliberate deception and more like statistical prediction outrunning verification.

Why are hallucinations a risk?

Because fluency creates false confidence. A weak answer written in broken language is easy to challenge. A wrong answer written in a calm, authoritative tone is much harder to spot. That is why hallucinations are not a small technical flaw; they are a real reliability and governance issue. The EU AI Act reflects exactly that concern by placing emphasis on accuracy, robustness, transparency, and human oversight for higher-risk uses of AI.

Real world examples of AI errors

There are already examples of this in serious professional settings. In 2025, Deloitte Australia admitted that a government-contracted assurance review of the country’s Targeted Compliance Framework contained fabricated references and quotes generated with Azure OpenAI GPT-4o and agreed to provide a partial refund. The problem was not that the report looked unprofessional. It was that some of the evidence behind it was invented. That is exactly what makes hallucinations difficult: the error can sit inside otherwise credible-looking work. In this case, the issues were spotted by a human reviewer, Australian welfare academic Chris Rudge.

In healthcare and pharma-related work, the stakes are even higher. A study published in BMJ Quality & Safety, a peer-reviewed journal focused on healthcare quality and patient safety, examined AI-powered chatbot answers to patient questions about commonly prescribed drugs. While many answers were broadly accurate, experts judged 66% of a subset of inaccurate answers to be potentially harmful, and 22% potentially severe or even life-threatening if followed. In regulated, evidence-heavy environments, “mostly right” is simply not a strong enough standard.

That is why this matters for sectors such as pharma, medical affairs, and R&D. If an AI tool generates a fabricated reference, misstates evidence, or gives a confident but incorrect interpretation, the issue is not just poor wording. It becomes a scientific credibility problem, a compliance risk, and potentially a patient safety issue. More broadly, the JRC has already warned that generative AI raises cross-cutting risks around misinformation, trust, and the quality of decision-making when people rely on it too quickly.

Should we stop using generative AI?

The good news is that this is not a reason to stop using generative AI. It is a reason to use it more deliberately. Where facts matter, outputs should be grounded in trusted source material rather than generated from model memory alone. Just as importantly, organisations need to keep people accountable for the final output. The most useful question is often the simplest one: What is the source for this? If that question cannot be answered clearly, the output should not be treated as evidence. That is particularly true in customer communication, legal work, scientific writing, regulatory material, and decision support.

Still, this remains an extraordinary tool. Generative AI has significant potential to improve innovation and productivity across industries. But it is useful to apply a bit of critical thinking to its outputs and treat it a little like your overconfident colleague: often helpful, often impressively fast, occasionally brilliant, but not someone you would quote blindly without checking the source first.

How to use GenAI responsibly

That is why we need to apply the right balance. Generative AI is powerful, it is useful but if we want to use it responsibly, we need to be honest about one thing: it can be very convincing when it is wrong.

PS. I would double-check the reference if I were you.

*Andrikyan W, Sametinger SM, Kosfeld F, et al Artificial intelligence-powered chatbots in search engines: a cross-sectional study on the quality and risks of drug information for patients BMJ Quality & Safety 2025; 34:100-109
Text edited with help of M365 Copilot GTP 5.4


Published by
  • Marta Dobrowolska-Haywood Head of Data Science and Knowledge Management Incotec