Your doctor’s AI notetaker may be making things up, Ontario audit finds

May 14, 2026

Made-up therapy referrals, incorrect prescriptions among the common mistakes.

OK, my AI notes here says you were referred for a total heart removal. Let me just get that squared away for you…

Credit:
Getty Images

In recent years, many overworked doctors have turned to so-called AI medical scribes to help automatically summarize patient conversations, diagnoses, and care decisions into structured notes for health record logging. But a recent audit by the auditor general of Ontario found that AI scribes recommended by the provincial government regularly generated incorrect, incomplete and hallucinated information that could “potentially result in inadequate or harmful treatment plans that may potentially impact patient health outcomes.”

In a recent report on Use of Artificial Intelligence in the Ontario Government, the auditor general reviewed transcription tests of two simulated patient-doctor conversations performed across 20 AI scribe vendors that were approved and pre-qualified by the provincial government for purchase by healthcare providers. All 20 of those vendors showed some issue with accuracy or completeness in at least one of these simple tests, including nine that hallucinated patient information, 12 that recorded information incorrectly, and 17 that missed key details about discussed mental health issues.

In the report, the auditor general points out multiple concerning examples of mistakes in those summaries that could have a direct and negative impact on a patient’s subsequent care. That includes situations where an AI scribe hallucinated nonexistent referrals for blood tests or therapy, incorrectly transcribed the names of prescription medication, and/or missed “key details” of mental health issues discussed in the simulated conversations.

Across all approved vendors, the average tested AI scribe scored only a 12 out of 20 on the “accuracy of medical notes generated” section of Supply Ontario’s evaluation rubric. But that seemingly key “accuracy” metric was only responsible for about 4 percent of a vendor’s overall score, making it easy to meet the minimum threshold for approval even if an AI scribe scored a “zero” on the accuracy metric (a separate metric measuring “domestic presence in Ontario” was worth 30 percent of the overall scoring).

All these factors contributed to the auditor general’s overall finding that these AI scribes “were not evaluated adequately.” In a display of restraint and understatement, the report notes that “it is important that AI scribe systems are tested to provide assurances as to the quality of their generated notes and to minimize inaccuracies.” It also recommends that IT departments using these scribes force doctors to “confirm their review of the notes produced” before committing them to patient logs.

Public sector health services in Ontario are not required to use these AI scribe systems in their work and may purchase scribes from non-approved vendors if they wish. Still, the fact that the Ontario government recommended AI summary systems with such obvious and potentially patient-harming flaws should give pause to any doctors (or their patients) making use of them.