Biased health data is a mirror of society
We are recommending implementations to enhance the reliability of data, ensuring fair access to suitable medical treatment for everyone.
We are recommending implementations to enhance the reliability of data, ensuring fair access to suitable medical treatment for everyone.
From drug development to predictive health, artificial intelligence has found rich applications in medicine in the 30 years since the Food and Drug Administration (FDA) approved its first AI-enabled medical device, a cervical imaging tool called PAPNET. Overburdened health systems and workers will, some politicians hope, be rescued by the efficiency and productivity benefits of AI.
But already, the biases embedded in AI are causing real-world harm. One digital health assistant was found to misdiagnose heart attacks in women based on atypical symptoms compared to men. “Rather than being told to go to the emergency room, they were told to go to the psychiatrist,” says Dr Xiaoxuan Liu, Honorary 125th Anniversary Fellow at the University of Birmingham, an expert in AI and Digital Health Technologies.
Another algorithm systematically underestimated the needs of black patients compared to white patients because it relied on health expenditure data, which was lower for black patients. “Contextually, this is not because they have less health needs, it’s because they could afford it less,” she explained.
Biased training data contributes to misfiring AI models, but fixing the data requires more than technical interventions downstream. It starts with curation, like electronic health record categories that fail to capture gender diversity or mixed ethnicity categories, leading to inaccurate representation. The second layer is population representation, as not all patients visit hospitals, attend appointments, or are willing to share their data due to trust issues. This creates gaps in understanding healthcare needs across demographic groups.
We need to talk about how society has structural inequality, about people who cannot access healthcare. There are undertones and continuations of structural oppression, discrimination and racism that have reverberations throughout society, and those power dynamics lead to differences in how healthcare is delivered.
While some bias is related to the data itself, such as transcription errors or poor standardisation, perfect data collection will not eliminate every problem since accurate data might reflect health inequalities across society, says Dr Joseph Alderman, digital health clinical research fellow at the University of Birmingham. “We need to talk about how society has structural inequality, about people who cannot access healthcare. There are undertones and continuations of structural oppression, discrimination and racism that have reverberations throughout society, and those power dynamics lead to differences in how healthcare is delivered,” he said. “These inequities are so deeply embedded, unless someone delves into them, we never realise the problem.”
Alderman gave the example of a race ‘correction’ performed for over two decades that saw black patients incorrectly assigned higher kidney function than other ethnic groups, leaving some ineligible for important interventions like dialysis and kidney transplants. “Whenever we apply categories to things that are innate human social constructs with blurred boundaries and continua of variation, we inevitably lose resolution and coerce people into categories that probably do not represent them very well,” he said.
Those working in AI and health technologies do not have the ability to fix the root cause of the problem. “It is useful for them to flag the issue, but they are the wrong people to make a decision. It requires a multi-stakeholder, multidisciplinary — almost a transdisciplinary approach by policymakers and those who control budgets”, he said.
Solutions may include adjusting diagnostic thresholds for some groups despite algorithm suggestions, and addressing the health inequalities beyond the algorithm. “We must assume that bias exists unless proven otherwise”, said Liu. Without addressing biases at source, which could be beyond the data itself, any health technologies built on these algorithms risk amplifying existing biases.
To tackle the underlying causes of data bias, Alderman and Liu, worked with partners on the STANDING Together (STANdards for data Diversity, Inclusivity and Generalisability) initiative, consisting of 58 countries and over 350 experts across several disciplines.
This was a transdisciplinary approach to solving the problem, where people from different sectors who would not traditionally agree on things come together to create a workable solution. “There are so many high-profile examples of where algorithms have gone wrong, both inside and outside of medicine, we did not have to convince anyone about the necessity of the exercise”, said Alderman.
The initiative employed a Delphi consensus study, which consists of a series of questionnaires designed to develop a consensus on topics that tend to have controversial opinions or have little empirical evidence. The project sought expert consensus on the key principles and practices to improve the quality and completeness of medical data.
Their published recommendations are freely available, and encourage transparency about the limitations of health datasets and their effects to encourage more informed choices about data usage and limits. That has reflected diverse perspectives across cultures. For instance, more homogeneous populations, such as those in Southeast Asia, are less concerned about ethnicity and race, and more concerned about income and employment status.
Similarly, some countries in Europe have historical reasons against collecting ethnicity data. “It became apparent how complex and difficult this would be. How do we create recommendations that are specific, yet general enough to provide coverage across the world? We decided that these issues are a much bigger conversation than we need for now”, said Dr Xiaoxuan Liu.
With over 1000 FDA-approved AI products, the bias problem needs urgent attention. AI is often considered a silver bullet, yet healthcare systems lack implementation capability despite growing numbers of interventional and randomised control trials. Projects such as STANDING Together can build confidence in AI capabilities
The challenge of data quality will only mount in the future. AI systems are sensitive to data changes, requiring continuous monitoring to identify and correct flaws and glitches and model performance can deteriorate over time. Alderman notes that the recommendations were made before technologies like ChatGPT were widely used. “There will be a need to adapt the recommendations as technology and the world move on. We also need to involve other communities beyond the 58 countries that we used,” he said. The team will shortly be releasing versions of the recommendations in other languages to increase readership.
The goal of the consensus methodology is not to create a restrictive checklist, but to spark a cultural transformation in how medical AI is developed and implemented, offering a roadmap to developing equitable and accurate medical AI.
Find out more about our Medical Technology and Data research >