Why Experts Must Not Use GenAI - Dr Mark Burgin
25/11/24. Dr Mark Burgin warns against using GenAI in expert reports particularly generating text, data analysis and making decisions.
There is little clear advice to experts on the use of Generative AI in their reports. The courts take a neutral view arguing that they may put weight on evidence from AI. This optimistic view comes from an ignorance of the way that Generative AI works. The limitations on any benefits and the profound downside risks that are present.
To understand why Generative AI should not be used in expert evidence we must first understand the nature of expert evidence. The records of the activity, interpretations of that activity and opinions on its meaning are the expert role. A court does not require an expert to say if a person has a tattoo but may need help with understanding its significance.
Anybody can give an opinion it is only an expert opinion that matters to the court. The court needs to understand where that opinion comes from where training or experience or research. Only where the expert can prove that they are in fact expert in the issues of the case can they be accepted.
Although some Generative AI has privacy modes there is a remaining concern about confidentiality. The key is to ensure that information is anonymised prior to be submitted and to use the privacy mode. This way if there is the inevitable loss of data then no harm will occur. For most used of GenAI there is no need to include patient identifiable information.
Generative AI talks rubbish
Although all experts will make mistakes they will normally have some insight. Generative AI happily provides frankly wrong opinions without recognising its mistakes. Like an expert with the onset of dementia it doubles down on its errors or provides less than adequate retractions. This confabulation can occur in expert evidence but is present in most interactions with Generative AI.
The answers that Generative AI gives are often eloquent and plausible. The impressive style of the answer can give the impression that the information it also of high quality. Sadly, almost every statement that the Generative AI makes is either too vague to be tested or contains errors. The problem is that the AI stitches together different ideas which create a patchwork rather than a whole.
I treat anything that Generative AI provides with suspicion, if it seems too good to be true it usually is. It is helpful when trying to work out what evidence might be persuasive. GenAI has the gift of the gab and can provide options of how to argue a case. The expert can then look for evidence in the areas that the GenAI has suggested. The expert who is struggling to work out where the missing evidence is hidden may find this useful.
Data Analysis
I have been waiting for an AI that can make summarising medical records easier. Although Generative AI produces a detailed summary of the records there are always errors. The summary looks different each time the same notes are reviewed and contain different information. Even if asked to simple copy the records GenAI can hallucinate entries that are not there.
The medical records review is a good test for AI as it is relatively straightforward. The AI has to look at each entry and copy it in a format that has been given to it. It must not invent entries or leave out important entries. Ideally it would be in date order and formatted but again this tends to be variable. As this task can be delegated to a non expert it is disappointing that the quality is so poor.
The expert needs to double check every medical records summary for errors and omissions. Even then the GenAI’s version can be seductive and mistakes can get into the report. As lawyers rarely double check this can only be found when another expert summarises the same records by hand. The difference between these summaries can be enough to find the alternative conclusion.
Undoubtably large numbers of medical experts are already using GenAI to summarise medical records. They do not consider it necessary to alert the court although there are many clues that the human was not in control. Checking GenAI’s work takes as long as summarising the records so safe use is very limited.
Making decisions
There has been interest in using GenAI as a decision aid and the output often is encouraging. The GenAI superficially appears to balance information fairly and give authentic explanations. Sadly the inherent problems including bias means that any decisions are flawed. The more that the decision is considered the more problems become obvious.
Balancing information is challenging for humans but GenAI finds it reasonably easy to perform this task. The way it achieves this is by mainly ignoring anything that is different. This means that although it can give good decisions small changes will cause bizarre responses. The programs have been trained to ignore any extra information which reduces the risk of odd responses but means that all decisions sound the same.
The problem is that the more reliable and consistent (lower heat) the model the less likely it will give an insightful response (higher heat). Although the same problem occurs in humans approaches such as emotional intelligence and experience can overcome the problem. Bias in data cannot be avoided and can be manifest by allowing the GenAI to resolve an ambiguous element such as ‘person’. GenAi seems to choose almost exclusively young men for pictures driving offences.
Conclusions
The three worst use cases for GenAI are generating text, data analysis and making decisions. In each of these situations the expert will easily lose control of the process. Even with the best prompts the results are of variable quality and can damage the expert’s reputation. The expert must be prepared to explain how the processing occurred and will be criticised if they cannot.
There are reasonable uses of GenAI such as proof reading and giving advice on a piece of text. It is very valuable when you are stuck and not sure how to explain an issue. It can be used to double check a PDF for specific issues and give an overview of a subject. Learning to collaborate safely with GenAI is a lengthy process even for those who have system analysis skills.
Careful reading of GenAI outputs typically gives hints as to what has gone wrong. Either the prompt contained a bias or the data did not contain the answer to the question. Often the prompt encouraged the GenAi to make up an answer. Prompt engineering is a difficult and complex task and few people will be able to create reliable prompts. Confidentiality must be preserved at every step of the process.
Doctor Mark Burgin, BM BCh (oxon) MRCGP is a Disability Analyst and is on the General Practitioner Specialist Register.
Your PGCME: An Amateur Guide to Medical Education is a book which explores both the limitations and strengths of GenAI
Dr. Burgin can be contacted on This email address is being protected from spambots. You need JavaScript enabled to view it. and 0845 331 3304 websites drmarkburgin.co.uk and gecko-alligator-babx.squarespace.com
This is part of a series of articles by Dr. Mark Burgin. The opinions expressed in this article are the author's own, not those of Law Brief Publishing Ltd, and are not necessarily commensurate with general legal or medico-legal expert consensus of opinion and/or literature. Any medical content is not exhaustive but at a level for the non-medical reader to understand.
Image ©iStockphoto.com/Just_Super