Less is more: Novel algorithm automatically removes poor-quality data to improve AI scalability

rubbish-data.jpg

AI Healthcare company Presagen has developed a novel technique that is able to automatically clean poor-quality data needed to train scalable and reliable Artificial Intelligence products. 

The patent-pending technique, called UDC, was applied to four types of imaging problems: detection of pneumonia in chest x-rays; embryo viability assessment in IVF; detection of cats and dogs; and detection of various types of vehicles.

In each case the UDC was able to reliably detect poor-quality data on its own. Removing poor-quality training data resulted in a significant improvement in accuracy (in some cases over 20%) and generalizability of the AI, necessary ingredients for commercial scale and reliability. Lack of scalability is a known problem with AI products, and was highlighted in a recent Venture Beat article by prominent Silicon Valley VC Andreessen Horowitz.

Dr Michelle Perugini, Presagen Co-Founder and CEO said “Real world problems like healthcare are not Kaggle competitions. Data are inherently poor quality due to clinical subjectivity, uncertainty, and even adversarial attacks where data contributors intentionally contribute poor-quality data. It is not always possible to reliably detect errors in data, even by experts. We have seen that even 1% poor-quality data can impact AI training stability and performance. This ground-breaking technique can automatically detect poor-quality data and allows us to build robust commercial AI products.”

The UDC can also be used to clean “test data”, which are data used to validate AI accuracy. This is often publicly reported to clinics and patients to describe the efficacy of the AI. 

Presagen Co-Founder and Chief Strategy Officer, Dr Don Perugini said “It is critical to ensure that test data are clean so that the reported accuracy is a true representation of the AI performance, and not misleading for clinics and patients that need to rely on it. With embryo viability assessment, we have seen literature reporting accuracies above 90%, however we have detected over 10% inherent poor-quality data due to the nature of the problem. This calls into question these very-high reported accuracies.”

Removal of poor-quality data using the UDC has a range of other significant benefits to the field of AI.

Dr Jonathan Hall, Presagen’s Co-Founder and Chief Scientist said “Removing poor-quality data with UDC reduces the quantity of data needed to train the AI, which is important because high-quality labeled data can be hard to come by. It also reduces the cost and time to train the AI. However, the most exciting benefit of the UDC is greater stability and accuracy within the AI training process itself, which means AI training process can be potentially automated with little or no human oversight, thus protecting people’s rights to privacy.”

Automatic detection and management of poor-quality data using the UDC is highly complementary to Presagen’s unique federated AI training system that allows AI to train on private data distributed in different locations throughout the world, without having to move, centralize or view the data. 

Presagen’s first AI medical product, Life Whisperer, is in use globally by IVF clinics. Life Whisperer uses AI to identify which embryos in IVF are likely to lead to a successful pregnancy. The assessment is conducted on single static images of embryos. In a published international clinical study Life Whisperer performed 25% better than current manual embryo assessment methods. Presagen’s UDC technique was critical in building Life Whisperer’s AI technology.

Presagen has recently developed a range of patent-pending AI technologies which drives a fundamental paradigm shift in developing commercially scalable AI products for real-world problems, that apply beyond healthcare and to AI more generally.

Dr Michelle Perugini said “We are excited that over the coming weeks and months we will present to the world the suite of technologies which we believe advance the field of AI. These technologies will allow Presagen to build scalable AI products that are more commercially viable and technically superior, and thus market dominating. This is vital in Presagen’s journey to become world-leaders in AI Enhanced Healthcare and a dominant player in the AI in Femtech market globally. More importantly we see it as an opportunity to change, lead, and dominate the AI industry more broadly.”

Andrew Murphy