Skip to content

MLCommons Launches MedPerf to Benchmark and Evaluate AI Medical Models

To help in benchmarking and evaluating AI medical models, MLCommons has developed a testing platform called MedPerf. This is aimed at reducing bias, building public trust, and supporting regulatory compliance in AI medical models.

MedPerf: New Platform to Benchmark AI Medical Models

With the acceleration of AI adoption in healthcare due to the pandemic, MLCommons has launched a testing platform, MedPerf, to provide a reliable way of benchmarking and evaluating AI medical models. According to MLCommons, MedPerf can test AI models on diverse real-world medical data while preserving patient privacy.

A collaboration of over two years, MedPerf was developed with input from both industry and academia. Unlike MLCommons' general-purpose AI benchmarking suites, MedPerf is aimed for healthcare organizations, the end-users of medical models. Hospitals and clinics can use the platform to evaluate AI models on demand using federated evaluation to deploy and test models remotely on-premises.

MedPerf supported the testing of 41 different models across 32 healthcare sites on six continents in a trial earlier this year. The results indicated reduced model performance at sites with different patient demographics than those the models were trained on, highlighting inherent biases.

Despite the promising results, some argue that the platform may not entirely address the persistent challenges in AI for healthcare, such as integrating the technology into daily routines of healthcare professionals and complex care-delivery systems. Therefore, while benchmarking platforms like MedPerf are crucial, ongoing auditing by vendors and customers is essential for safely deploying AI medical models.