Can ChatGPT Write Radiology Reports?

Som Biswas1*, Salman Khan2, Sandeep Singh Awal3

1Le Bonheur Children’s Hospital, The University of Tennessee Health Science Centre, Memphis.

2The University of Texas Health Science Center at Houston (UTHealth Houston).

3Jeevandeep Diagnostics, Jamshedpur, India.

*Corresponding Author:Som Biswas, Le Bonheur Children’s Hospital, The University of Tennessee Health Science Centre, Memphis.Tel: 901.448.5364, Fax: 901.448.6182

Citation: Som Biswas, Salman Khan, Sandeep Singh Awal (2023) Can ChatGPT Write Radiology Reports? Medcina Intern 5: 213.

Received: November 19, 2023; Accepted: December 09, 2023; Published: December 16, 2023.

Summary

This example provides proof that radiology report drafting can be assisted by ChatGPT, thus facilitating timely reports.

With ever-increasing requisitions for imaging studies, especially after the Covid-19 pandemic and with a global shortage of radiologists [1], [2], there has been a tremendous increase in the workload of currently practicing radiologists. ChatGPT could potentially help by assisting in radiological report writing for imaging studies. Once a diagnosis is entered, ChatGPT can generate a report [3][4] for the radiologist to verify and edit before signing off, potentially reducing report writing time and effort. This capability is tested in this article.

What constitutes a good radiology report?

The American College of Radiology guidelines specify how the radiology report should be generated [5].  The following are some of the key characteristics of a good radiology report:

  1. Clarity: The report should be written in language that is easily understandable by both medical professionals and patients. The findings should be presented concisely, with relevant imaging findings described.
  2. Accuracy: The report should be based on correct interpretations of those imaging findings. The radiologist should take care to ensure that the report is free of errors.
  3. Comprehensive: The report should include all relevant information about the patient's clinical history, imaging findings, diagnosis, and recommendations for follow-up care. This helps to ensure that the referring clinician has all the information they need to make informed decisions.
  4. Relevant: The report should include only information that is relevant to the patient's care. Irrelevant or redundant information should be excluded, as it can make the report difficult to understand and may lead to confusion.
  5. Timeliness: The report should be delivered to the referring clinician promptly, as the results of the imaging examination are often needed to inform the patient's care.

In summary, a good radiology report is a document that helps to ensure that patients receive the best possible care.

What is ChatGPT?

GPT stands for "Generative Pre-trained Transformer", which refers to a type of language model developed by OpenAI. ChatGPT is a specific implementation of the GPT model that is designed for conversational AI applications, such as chatbots. ChatGPT was trained using a massive amount of data, which was collected from various sources such as books, articles, websites, and social media. The training data was preprocessed and tokenized into sequences of words, and then the GPT-3.5 model was trained on this tokenized data using unsupervised learning techniques. 

Compared to other large language models (LLMs) like BERT or ELECTRA, ChatGPT is one of the largest in size, with up to 1.6 billion parameters in its latest version. This means that it has a much greater capacity to learn patterns and relationships in language than smaller models.  Currently, ChatGPT is not specifically trained for medical use, however, it can be trained on medical data to make it useful for medical report writing. Med-BERT is an LLM being trained/developed specifically for medical use.

We demonstrate below ChatGPT’s capability to generate customized radiology reports based on a few keywords.

Hypothetical case 1: A 62-year-old patient with colon cancer comes with recurrent headaches. MRI Brain is performed. The human radiologist finds a lesion in the right parietal lobe measuring 4x5x3 cm in size that looks like metastasis. Instead of typing or dictating the entire report, just the keywords are entered in ChatGPT “create MRI report of 4x5x3 cm right parietal lobe metastasis”.

Chat GPT then prints the entire report in seconds as below:

Magnetic Resonance Imaging (MRI) Report.

Exam Date: [Insert Date] Patient Name: [Insert Patient Name] Patient ID: [Insert Patient ID]

Clinical Indication: Evaluation of a right parietal lobe mass. 

Imaging Findings:

A contrast-enhanced MRI of the brain was performed and evaluated. The images demonstrate a well-defined, hyperintense, and contrast-enhancing lesion in the right parietal lobe measuring approximately 4 cm x 5 cm x 3 cm. The lesion demonstrates central hypointensity on T2-weighted imaging, consistent with neoplastic tissue. The lesion is surrounded by a peripheral rim of contrast enhancement, demonstrating the presence of a blood-brain barrier breach. The lesion is associated with moderate mass effect, with effacement of the adjacent sulci and moderate displacement of the adjacent lateral ventricle. There is minimal surrounding edema. No other intracranial abnormality is identified.

Diagnosis:

Right parietal lobe metastasis is demonstrated.

Comparison: [Insert comparison with previous studies, if applicable].

Impression: The imaging findings are consistent with a right parietal lobe metastasis. The presence of contrast enhancement and moderate mass effect suggest a neoplastic origin, although biopsy is recommended to establish a definitive diagnosis and determine the primary origin of the lesion. The patient's clinical presentation and history should be taken into account when interpreting the imaging findings.

This report is intended to be used in conjunction with the referring clinician's findings and is not a replacement for a complete medical evaluation.

Radiologist: [Insert Radiologist Name and Signature]

The human-generated report written by an independent radiologist (who has not seen the ChatGPT report) for this case is given below:

RADIOLOGY REPORT 

EXAM: MRI OF THE BRAIN 

TECHNIQUE:

A Multiplanar multi-echo MRI of the brain was performed without and with IV contrast.

FINDINGS:

A well-defined lesion with perilesional edema is seen in the right parietal region. It measures 4x5x3 cm in size. It shows restricted diffusion on DW images. No blooming is seen on SW images.  The lesion is hypointense on T1W and hyperintense on T2W images. The lesion shows heterogeneous enhancement on post-contrast images with a central non-enhancing area of necrosis. Mass effect is seen on adjacent neuroparenchyma with effacement of the adjacent lateral ventricle. A midline shift of 4mm is seen to the left.

The rest of the cerebral cortex and underlying white matter reveal no abnormal signal intensity. Grey-white matter differentiation is normal.

The cerebellum and brainstem are unremarkable. The deep nuclei are normal.

No evidence of acute infarction or bleeding is seen.

The rest of the ventricular system, basal cisterns, sulcal spaces, sylvian fissures, and cerebellar folia appear normal for age.

Bilateral mastoid air cells, paranasal sinuses, and orbits appear normal.

IMPRESSION: Solitary well-defined lesion with perilesional edema in the right parietal region with a central area of necrosis, mass effect, and midline shift. Findings indicate neoplastic mass. Histopathological correlation is suggested. 

On comparison, we notice that although the ChatGPT-generated report is not perfect, it is close enough to a human-written report to serve as a provisional draft. Some errors do occur in the current version of ChatGPT that needs to be corrected by the human radiologist. For instance, the line stating, “The lesion demonstrates central hypointensity on T2-weighted imaging, consistent with neoplastic tissue” and the line stating that the "lesion is surrounded by a peripheral rim of contrast enhancement, demonstrating the presence of a blood-brain barrier breach.” needs to be edited, as this is hallucination by ChatGPT, and no such information is provided in the input.  Also, the description of the lesion by ChatGPT is less detailed regarding the findings on each MR sequence, compared to the human radiologist’s report. The presence or absence of midline shift is not mentioned in the ChatGPT report, and neither are the rest of the normal structures, like the cerebellum and brainstem.

On using ChatGPT for reporting a more complex case of peritoneal carcinomatosis on CT as shown below, we see that the hallucinations and inaccuracies increase.

Hypothetical Case 2: “52-year-old with peritoneal carcinomatosis with small bowel obstruction with multiple liver metastases and bilateral hydroureteronephrosis from pelvis metastatic disease”.

MRI Report: Peritoneal Carcinomatosis with Small Bowel Obstruction, Multiple Liver Metastases, and Bilateral Hydroureteronephrosis from Pelvic Metastatic Disease

Clinical History: The patient is a 52-year-old male with a history of peritoneal carcinomatosis and multiple liver metastases, who presented with small bowel obstruction and bilateral hydroureteronephrosis from pelvic metastatic disease.

MRI Findings: The MRI examination was performed with T1 and T2 weighted sequences in axial and coronal planes, as well as gadolinium-enhanced images.

The liver demonstrates multiple round and irregularly shaped lesions, ranging in size from 1 cm to 5 cm, consistent with metastatic disease. The largest lesion is located in segment VII of the liver.

The peritoneal cavity shows diffuse thickening and nodularity of the peritoneal surface with the presence of multiple soft tissue masses. The peritoneal nodules appear to have invaded the small bowel mesentery, leading to small bowel obstruction.

In the pelvis, there are multiple soft tissue masses, involving the bladder, rectum, and bilateral pelvic sidewalls. The bladder wall appears thickened with invasion by the pelvic masses. There is also evidence of bilateral hydroureteronephrosis, likely due to the obstruction of the ureters by the pelvic masses.

No evidence of ascites is seen.

The visualized portions of the thorax are within normal limits.

Impression:

  1. Diffuse peritoneal carcinomatosis with involvement of the small bowel mesentery, leading to small bowel obstruction.
  2. Multiple liver metastases.
  3. Bilateral hydroureteronephrosis is likely due to pelvic metastatic disease.
  4. Pelvic masses involving the bladder, rectum, and bilateral pelvic sidewalls. 

As an example of inaccuracy, the statement “The peritoneal nodules appear to have invaded the small bowel mesentery, leading to small bowel obstruction” does not make sense.

Multiple iterations do improve the report as ChatGPT learns from prior inputs and is becoming better every day. ChatGPT also gives a slightly different output each time on entering the same input, providing non-template-like generic reports and giving the impression of a customized report for each patient. ChatGPT’s performance varies depending on the selection of keywords, with more keywords producing a more accurate and more customized report. However, the final responsibility to edit, verify, and then dispatch the report still lies with the human radiologists.

Studies are being conducted on the utility of ChatGPT in radiology. In one study by Jeblick et al, radiology reports simplified by ChatGPT were assessed by 15 radiologists, who found 51% of the simplified reports had incorrect statements and 36% had potentially harmful conclusions [6]. However, they do conclude that the rest of the positive votes received from human radiologists illustrate the great potential of LLMs to simplify medical text.

Disadvantages of writing radiology reports using a language model like ChatGPT.

Creating radiology reports with ChatGPT also presents possible pitfalls.  These include:

  1. Bias: ChatGPT may exhibit bias in its responses, possibly related to race, gender, or religion. This could result in ChatGPT providing responses that are discriminatory or offensive. To address this issue, OpenAI has implemented measures such as bias evaluation and mitigation techniques in the development of ChatGPT.
  2. Hallucination (Fabrication): ChatGPT can generate text that is convincing but false. This can occur when the model is asked to generate text that is beyond the scope of its training data or when it is given ambiguous or vague prompts. This could be problematic if the model is used to spread fake news or propaganda. OpenAI has implemented measures such as content filtering and fact-checking to address this issue.
  3. Lack of clinical context: ChatGPT does not currently have access to the patient's complete medical history, examination findings, and previous imaging studies, which are critical in interpreting the radiology images and generating a complete and accurate report. This can be rectified by creating a medical version of ChatGPT that would have access to patient data and be compliant with HIPAA guidelines. Recently Doximity, a social networking platform designed specifically for healthcare professionals, has come out with DocsGPT.com.  This is a beta version of the ChatGPT tool to streamline administrative paperwork.
  4. Inaccurate reports: AI language models like ChatGPT are trained on vast amounts of data, but they are still limited in their ability to understand the complexities of medical imaging and the nuances of human physiology. This can lead to inaccuracies or inconsistencies in the reports generated by ChatGPT.
  5. Legal status and liability: Radiology reports are an important part of a patient's medical record and can have legal implications. The use of AI-generated reports may raise questions about their admissibility in legal proceedings. This can be solved by keeping the mandatory requirement of human editing and verification of every report before dispatch. Concerning liability, ChatGPT could be used to provide advice or guidance that leads to harm. It is important to provide appropriate disclaimers and disclosures to users. Some legal precedents about automated text outputs raise concerns about bias and privacy [7].
  6. Privacy: ChatGPT may be used to collect personal information from users, such as their names or location. Depending on the jurisdiction in which the model is being used, there may be specific legal requirements for obtaining user consent, handling personal information, and protecting user privacy.
  7. Need for human review: Despite their potential benefits, AI-generated reports still need to be reviewed by a qualified radiologist to ensure accuracy and completeness.
  8. Lack of a personal touch: Radiology reports written by a language model can result in a report that is more impersonal and less engaging for the referring clinician and the patient than that authored by a human radiologist. 

What does ChatGPT say when asked “Can ChatGPT write radiology reports?”

“Yes, I can write radiology reports. However, it's important to note that I am a machine learning model trained on text data and may not have the same level of expertise and knowledge as a licensed radiologist. To ensure accurate and reliable radiology reports, it's recommended to consult a licensed radiologist or use specialized software designed specifically for this purpose. That being said, if you provide me with the relevant imaging and any additional clinical information, I can generate a radiology report based on that information.”

References

  1. Radiology Facing a Global Shortage. (2023) https://www.rsna.org/news/2022/may/Global-Radiologist-Shortage.
  2. International Radiology Societies Tackle Radiologist Shortage. (2023) https://www.rsna.org/news/2020/February/International-Radiology-Societies-And-Shortage.
  3. Kitamura Felipe C. (2023) "ChatGPT Is Shaping the Future of Medical Writing but Still Requires Human Judgment." Radiology: 230171.
  4. Shen Yiqiu. (2023) "ChatGPT and Other Large Language Models Are Double-edged Swords." Radiology: 230163.
  5. American College of Radiology website (2023) ACR practice parameter for communication of diagnostic imaging findings.
  6. Jeblick Katharina. (2022) "ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports." arXiv preprint arXiv: 2212.14882.
  7. Zalnieriute Monika, Lyria Bennett Moses, George Williams. (2019) "The rule of law and automation of government decision?making." The Modern Law Review 82.3: 425-455.

Declaration of Generative AI and AI-assisted technologies in the writing process’.

Statement: During the preparation of this work the author(s) used ChatGPT (powered by OpenAI's language model, GPT-3.5; http://openai.com) in order to generate radiology reports to demonstrate ChatGPT’s abilities. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.