npj Digital Medicine is an open-access, international, peer-reviewed journal in the Nature family of journals. It is dedicated to publishing the highest-quality research relevant to all aspects of digital medicine and health. In November 2019 it published a still-timely perspective piece on the Challenges of Developing a Digital Scribe to Reduce Clinical Documentation Burden. Here’s an excerpt from the abstract:
“Advances in artificial intelligence (AI) and machine learning (ML) open the possibility of automating clinical documentation with digital scribes, using speech recognition to eliminate manual documentation by clinicians or medical scribes. However, developing a digital scribe is fraught with problems due to the complex nature of clinical environments and clinical conversations.”
The article is well worth reading in its entirety, but if you don’t have a technical background it can be difficult to understand the inherent technical problems involved. Here, I will try to give some intuitive explanation of why digital scribing is such a difficult task.
Contextual Interpretation is Very Difficult
The article correctly identifies topic segmentation as one of the major challenges for digital scribes. The overarching reason that topic segmentation is challenging is because human conversation is unstructured and unconstrained. We bounce from topic to topic, changing the subject only to return to it a moment later, often via ambiguous or obscure referent expressions. This makes it very difficult to understand the topic of an utterance in isolation. Humans cope with these problems by taking context into account in myriad ways. Unfortunately, interpreting context remains a huge challenge for digital scribes. Consider the following conversation fragment:
I want you to get an MRI. Your headaches are probably due to something routine, but since they are unusual for you I’d like to rule out some potential causes first.
What’s that? Will it hurt?
It’s a way to make images of the brain without the use of X-rays.
Is it dangerous?
No, it’s very safe.
What will it tell you? What is the worst that it could be?
A brain tumor; hydrocephalus, which is a buildup of fluid in the brain; a stroke; an infection….
They all sound very serious!
Don’t worry, it’s like in a cop show: we’re just ruling out some possible suspects first.
Considered in isolation, the highlighted fragment might cause a digital scribe to list all of these potential diagnoses in the patient’s electronic health record (EHR). But in the context of the entire fragment, it is clear to us that the doctor is explaining potential, low-probability diagnoses that she is trying to rule out, and that should definitely not be reflected in the EHR at this point. Interpreting this context is easy for humans, but very difficult for a computer system.
In this example, one factor that makes it difficult for the digital scribe to interpret context is the scattered structure of the conversation. The algorithms used by digital scribes often rely on proximity to identify phrases in a conversation that relate to the same topic, but that heuristic will not work well here: the initial mention of the MRI and the potential diagnoses that it might indicate are separated by several sentences.
A second confounding factor is real-world knowledge – a metaphor involving TV dramas – that is needed to clarify that the doctor is describing a diagnostic test and not a diagnosis. Representing knowledge about the real world is still a massive challenge for AI systems.
Here is another example that illustrates some additional challenges of context interpretation in just two short sentences:
I got a new cane recently and this leg has started hurting again.
I think it’s because it’s shorter.
The patient does a great job of simply and succinctly describing his complaint and offering a likely cause of the problem, but a digital scribe will not be able to make complete sense of this conversation. The reference to ‘this leg’ is called a deictic reference: it can only be interpreted properly (left leg or right leg?) in the context of a pointing gesture that a human will easily recognize and interpret, but that will cause great difficulty for any digital scribe.
Making sense of the second sentence requires the most common form of anaphora resolution: to what are the two mentions of the pronoun ‘it’ referring? We can easily tell: the first reference is to the leg that has started hurting again, and the second is to the new cane. But a digital scribe will have difficulty resolving these references, not least because it lacks the real-world knowledge that a shorter cane is far more likely than a shorter leg.
Other Factors that Impede Accurate Summarization
The ultimate goal of a digital scribe is to summarize conversations in some structured format. As we saw above, contextual interpretation makes summarization difficult, but it is not the only underlying cause of difficulties. Clinicians think aloud as they consider multiple options and scenarios. The patient may contribute additional information that causes the clinician to change his mind and contradict a previous conclusion or decision. A digital scribe will have great difficulty distilling a coherent summary of clinical conversations that involve speculative statements, misleading answers, subsequent ruminations, and ultimate corrections.
It’s not hard to imagine specific examples of these phenomena in a clinical conversation, but I will give a personal one here. My father suffered from post-polio syndrome, which manifests as a cluster of potentially disabling symptoms that typically appear decades after the initial polio illness. The symptoms are wide-ranging: progressive muscle and joint weakness and pain; general fatigue and exhaustion with minimal activity; muscle atrophy; breathing or swallowing problems; sleep-related breathing disorders, such as sleep apnea; decreased tolerance of cold temperatures; etc. So one can imagine that doctor-patient conversations about post-polio symptoms and treatments can be diverse and complex. But they are made a lot more complicated when the patient makes things up!
One of my father’s favorite fabrications was to blame all his ills on vaguely specified ‘war wounds.’ I don’t think he thought he was fooling anyone; he just enjoyed telling a tall tale, the more implausible the better. Most doctors were surely in on the joke from the start – for one, he lived in a country that hadn’t been at war in almost a century – but they would often go along with it just to humor him and themselves. Whether the reason is whimsy, anxiety, dementia, or something else, patient-physician conversations include plenty of extraneous material that digital scribes cannot handle properly with current technology.
The npj Digital Medicine article concludes that it will take years to develop the AI algorithms for computer-based digital scribing that can match human levels of performance, and hopefully the intuitive examples in this and my previous post will help you understand why that is the case.
At Augmedix, we think that the best approach is to use AI to augment human scribes rather than to replace them. In my next posting I will describe some initial results that we have achieved at Augmedix with our take on Intelligence Amplification for digital scribing.
Augmedix board member Joe Marks is Executive Director of the Center for Machine Learning & Health at Carnegie Mellon University.