Study: Google reveals new capabilities of Med-Gemini's LLMs

A study performed by Google Research, in collaboration with Google DeepMind, reveals the tech giant expanded the capabilities of its AI models for Med-Gemini-2D, Med-Gemini-3D and Med-Gemini Polygenic.

Google said it fine-tuned Med-Gemini capabilities using histopathology, dermatology, 2D and 3D radiology, genomic and ophthalmology data.

The company’s Med-Gemini-2 was trained on conventional medical images encoded in 2D, such as CT slices, pathology patches and chest X-rays.

Med-Gemini-3D analyzes 3D medical data, and Google trained Med-Gemini-Polygenic on non-image features like genomics.

The study revealed that Med-Gemini-2D’s refined model exceeded previous results for AI-enabled report generation for chest X-rays by 1% to 12%, with reports being “equivalent or better” than the original radiologists’ reports.

The model also surpassed its previous performance regarding chest X-ray visual question-answering thanks to enhancements in Gemini’s visual encoder and language component.

It also performed well in chest X-ray classification and radiology visual question-answering, exceeding previous baselines on 17 of 20 tasks; however, in ophthalmology, histopathology and dermatology, Med-Gemini-2D surpassed baselines in 18 of 20 tasks.

Med-Gemini-3D could read 3D scans, like CTs, and answer questions about the images.

The model proved to be the first LLM capable of generating reports for 3D CT scans. However, only 53% of the reports were clinically acceptable. The company acknowledged that additional research is necessary for the tech to reach expert radiologist reporting quality.

Med-Gemini-Polygenic is the company’s first model that uses genomics data to predict health outcomes.

The authors wrote that the model outperformed “the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained.”

THE LARGER TREND

Researchers reported limitations with the study, stating it is necessary to optimize the multimodal models for diverse relevant clinical applications, extensively evaluate them on the appropriate clinical datasets, and test them outside of traditional academic benchmarks to ensure safety and reliability in real-world situations.

The study’s authors also noted that “an increasingly diverse range of healthcare professionals need to be deeply involved in future iterations of this technology, helping to guide the models towards capabilities that have valuable real-world utility.”

A number of areas were mentioned where future evaluations should focus, including closing the gap between benchmark and bedside, minimizing data contamination in large models and identifying and mitigating safety risks and data bias.

“While advanced capabilities on individual medical tasks are useful in their own right, we envision a future in which all of these capabilities are integrated together into comprehensive systems to perform a range of complex multidisciplinary clinical tasks, working alongside humans to maximize clinical efficacy and improve patient outcomes. The results presented in this study represent a step towards realizing this vision,” the researchers wrote.