By Dr Michelle Perugini, Dr Jonathan Hall, Dr Don Perugini

Commercial AI products are not scaling as most would have expected. This has led to fatigue, both for those developing the AI, but also for those who stand to gain from its application. AI fatigue is a product of the high level of hype and the enormous amount of information, and sometimes misinformation about what it can achieve. For many AI applications, reality sets in quickly when the AI fails to deliver on its promises in commercial use.

Deep Learning is a subset of AI, and a popular technique that is well suited to data classification problems such as medical image analysis, which involves finding patterns in data. However, many in the AI industry have found it challenging to build AI products using deep learning that are commercially scalable.

What is scalable AI? Scalable AI means that the AI is both accurate and robust. Robust means the AI is generalizable in its ability to apply to different companies or users, different contexts or situations, and different geographical locations. For medical imaging, robust AI should reliably work “out of the box” in different clinical settings and for patients globally.

Scalable AI = Accurate + Robust

Lack of AI scalability was the subject of a recent article by prominent Silicon Valley investor Andreessen Horowitz, who have worked with a range of AI companies. The Economist’s recent Technology Quarterly report comprised multiple articles on the limitations of AI and its inability to generalize. Many in the AI industry are finding it challenging to translate AI from the lab to scalable commercial software products, without having to endure significant timely and costly ongoing services or retraining to ensure the AI works for each new user and without bias.

AI scalability is not a problem with the technology itself. Rather the problem is with the way AI is built for commercial applications.

There needs to be a fundamental paradigm shift in the way AI is used to develop commercially scalable AI products. This paper positions an alternate viewpoint on where effort should be made in the AI development process to ensure it can scale as intended.

AI Data Fallacy

The AI industry currently has a relentless focus on Data Quantity and AI Accuracy. The view is that more data results in greater accuracy. If the AI does not work well, then many assume that more data will solve the problem. This AI Data Fallacy prevents the AI industry from building scalable AI.

AI Data Fallacy: Greater Data Quantity = Greater AI Accuracy

Believing more data is the solution has other implications for the AI industry. Data are typically limited and difficult to find and collect. Data can be held by multiple custodians, such as clinics and hospitals in healthcare. Data can be expensive to collect, requiring annotation and labelling (by clinicians in healthcare), cleaning, and preparation which can contribute 50% of AI training costs. Consent from data owners (e.g. patients) is needed to use their private data, and additional incentives are sometimes required for data custodians to share the data. Data privacy and security laws can introduce barriers to sharing, storing, accessing and handling the data. For example, healthcare privacy laws can prevent global datasets leaving the country of origin or being centralized as necessary for AI training.

The connection between Data Quantity and AI superiority can also be misleading. Those with the most data will not necessarily have the best AI. The ability for others to obtain data does not necessarily mean they can recreate AI to the same quality. Building AI for real-world problems is typically challenging, regardless of data quantity. Building high performing AI is only one piece of the puzzle needed to make an AI product a commercial success.

Whether it is a company or a whole country, the ability to collect massive datasets does not guarantee AI superiority. Data is important. However, it is not the only factor in creating Scalable AI needed to lead the AI industry.

AI Paradigm Shift for Building Scalable AI

To build Scalable AI, the AI industry needs to shift their focus from Data Quantity and AI Accuracy to Data Quality, Data Diversity, AI Robustness, and domain Knowledge about the problem, the data, and how the AI will be used in practice, which together, make Scalable AI Accurate and Robust:

Scalable AI = Data Quality + Data Diversity + AI Robustness + Knowledge

Data Quality

High performing Scalable AI cannot be built on poor quality data. It impacts on both accuracy and robustness of the trained AI. Even 1% error or noise in the data can impact AI training stability and accuracy, regardless of the quantity of data.

As the saying goes: “Garbage In, Garbage Out”. Unfortunately for most real-world problems “Garbage In” cannot be avoided, rather it must be dealt with.

Real-world problems like those in healthcare are not Kaggle competitions. Data is inherently poor quality due to clinical subjectivity, medical uncertainty, human error, and even adversarial attacks where data contributors can intentionally contribute poor quality data. As a result, some AI practitioners may unknowingly train on poor quality data, whilst others are just unable to effectively “clean” the data.

Having effective data cleaning techniques to improve data quality is arguably the most significant factor in building Scalable AI. We have seen a significant 20% improvement in accuracy and greater generalizability in multiple domains from effective removal of poor quality or noisy data used for AI training.

Using good quality data has other profound benefits. It reduces the amount of data, and the cost and time required to train high performing AI. It also increases stability and accuracy within the AI training process itself. This means the AI training process can be potentially automated with little or no human oversight, thus further protecting people’s rights to privacy by limiting human access to the data.

Another issue with poor quality data is the testing and validation of AI performance. If the data used to test the performance of the AI are poor quality or not representative, then the AI may not be as good as the test results suggest.

More concerning is that test results are often publicly reported to clinics and patients to describe the efficacy of the AI. It is critical to ensure that test data are clean and representative so that the reported accuracy is a true reflection of the AI performance, and not misleading for clinics and patients that will ultimately rely on it.

We often see literature overstating AI accuracy of above 90% for medical domains where we have detected well over 10% inherent poor quality data due to the nature of the problem. This calls into question these very-high reported accuracies. Overstated accuracies can lead to mistrust and skepticism, failure to meet user expectations, and potentially negative consequences from bad AI results which have been relied on by users. This negative perspective and experience is damaging to the AI industry for a technology that, when applied in the right way, can have significant benefit.

Data Diversity

Bias is a significant problem with AI. In healthcare, AI can be biased to particular types of people or clinical settings (e.g. medical equipment). Greater bias means less scalability, i.e. the same AI will not work for everyone.

An MIT Technology Review article stated that facial recognition algorithms developed in the US were bad at recognizing Asian, African-American and Native American faces. However, algorithms developed in Asian countries worked well for both Asian and Caucasian faces. This is not surprising. Algorithms in Asia are likely trained on a more diverse dataset which includes a greater mix of Asians and Caucasian faces.

Healthcare problems are global and therefore need a global first solution. To solve these problems effectively requires a globally diverse dataset which represents different types of patients and clinical settings from around the world. Only then will one be able to create unbiased Scalable AI that can be reliably used by any clinic and for any patient, anywhere in the world.

Unfortunately, in industries such as healthcare, data are distributed amongst many data custodians, the clinics and hospitals. Collecting a globally diverse dataset from multiple clinics is challenging.

The simplest path for many AI companies is to collect data from one or a few clinics that have a large dataset. It is typically from a prestigious clinic so that it looks good on an investor pitch. Investors may not be aware of the potential commercial limitations. A recent article by The Economist stated that an AI system trained to spot pneumonia on chest x-rays, built within a hospital in New York, became markedly less competent when used in hospitals other than those it had been trained in.

Collecting a globally diverse dataset is further complicated by healthcare privacy laws that can prevent data leaving the country of origin or being centralized for AI training. Emerging AI training techniques such as Federated Learning allow AI to train on globally distributed data without having to move the data from its source, and allow AI teams to train on private patient data that they never need to see.

Google/Alphabet CEO Sundar Pichai was quoted in a CB Insights article saying: “On user privacy and control, it’s always been a big focus for us...Initiative is underway for example like Federated Learning for almost three years...I think it’s one of the most important areas we are working on.”

A more important reason for a globally diverse dataset is for testing and validating the AI. Regardless of how it is built, it is vital to be able to prove that Scalable AI has been built. Only then will the AI be ready for prime time and commercial scale.

Scalable AI can be practically achieved through diverse data. It just needs additional investment and effort upfront to take a global first approach to the problem, to create something that has greater commercial and social value further down the track. ‘Start small first and then expand’ may work well in other industries, including for traditional software products, but it may not be the most effective approach for the AI industry. Like with Data Quality, Data Diversity makes it difficult for any one company or country to dominate the AI industry, unless for local purposes, without a global collaborative effort.

AI Robustness versus AI Accuracy

The AI industry is obsessed with accuracy. Even though accuracy is vital, the assumption is that selecting AI with the greatest accuracy will be academically and commercially superior.

However, accuracy that is measured is heavily biased to the dataset used to test the AI and is not necessarily representative of the general performance of the AI at global scale. Anecdotally, there have been cases in healthcare where the accuracy of AI (based on a test dataset) has dropped over 20% when applied in practice to a new unseen dataset.

Higher general accuracy can be achieved by focusing on robustness when training and selecting the AI to put into commercial use. We have found that selecting AI based solely on accuracy during training can come at the expense of robustness and make the AI fragile or brittle. Selecting AI that is less accurate but focuses on robustness as well, or how generalizable the AI is, typically ends up being more accurate and performs better generally. This makes the AI more scalable and reliable.

Evaluation of either accuracy or robustness cannot be effectively achieved with poor quality data, as the metrics for comparison cannot be relied upon, resulting in poor performing AI being selected and commercially applied.

Knowledge

Building effective commercially Scalable AI for real world problems is difficult. You cannot just throw data at AI algorithms and expect a good result, as some would have you believe. The AI does not always find what it is looking for on its own.

Building effective AI requires Knowledge. It requires domain expertise about the problem you are solving, and the nature of the data you are using to solve that exact problem, to better target the AI. It requires knowledge about the users and how they will use the AI to ensure that the data inputs, the training process, and AI outputs address the problem effectively. It requires expert knowledge to understand why the AI may have classified data incorrectly, and whether the issue is with the AI, the data labeling, or other factors that may or may not be relevant. It requires knowledge about the AI algorithms themselves, how they work, and their limitations to ensure they are mitigated. It requires diverse perspectives about the problem, solution, and outcomes.

The more the AI is guided and targeted, the better the AI outcome. It takes a multi-disciplinary and diverse team of competent technical and domain experts – beyond a Kaggle dataset and an AI engineer.

By targeting AI appropriately with automated annotations, we have been able to build AI using a dataset from one imaging type (microscope camera) to apply equally to another type (time-lapse imaging) which looks substantially different. This ensures the AI can scale to any imaging equipment that may be used, and will not be limited to a specific imaging system.

Some may say that Knowledge biases the AI, and only the data itself should guide AI development. Our decades of experience suggest otherwise. Using evidence from scientific literature, expert knowledge, and common sense can have a significant impact on improved AI performance.

However, regardless of how the AI is built, the true test is how well the AI performs on a globally diverse test dataset, and in practice.

Conclusion

The AI industry is starting to experience increased skepticism and declining enthusiasm due to AI not meeting expectations despite significant investment.

In this article, we argue that it is less of a problem with AI itself, and rather a problem with the way the AI industry builds products for commercial use. We propose that a fundamental paradigm shift in thinking and a re-purposing of existing techniques is required in order to build commercially scalable AI. It involves shifting focus from Data Quantity and AI Accuracy, to Data Quality, Data Diversity, AI Robustness and Knowledge.

It requires the AI industry to change their approach technically in how they build AI, and commercially in how they engage and work with end users and their data. It requires a global first approach, and a collaborative effort, comprising diverse datasets and multi-disciplinary team. Importantly it requires new technology, techniques and algorithms beyond the AI itself to support various aspects of the broader AI problem (Data Quality, Data Diversity and AI Robustness), as well as for problem specific applications of the AI (Knowledge).

About the Authors

Dr Michelle Perugini, Dr Jonathan Hall and Dr Don Perugini are co-founders of Presagen, a global AI healthcare company focusing on women’s health (Femtech). Presagen first product is Life Whisperer, which uses AI to assess images of embryos identify their viability with the aim of improving pregnancy outcomes for couples undertaking IVF. Life Whisperer is currently being used by IVF clinics globally.

Dr Michelle Perugini is the CEO of Presagen. Michelle has had over a decade experience in stem cell biology and genetics. She sits on numerous Government and Academic Committees, is a mentor to various innovative deep technology start-ups, and has a global reputation as a leader in AI and healthcare as evidenced by her frequent invitations to speak at international conferences and events. Michelle has won many women in innovation awards and continues to advocate for improved women’s health outcomes.

Dr Jonathan Hall is the Chief Scientific & AI Officer at Presagen. Jonathan has two PhD’s in both Physics and Nanotechology and was the recipient of the MIT Technology Review Innovator Under 35 award in 2019.

Dr Don Perugini is the Chief Strategy Officer at Presagen. Don has a PhD in AI and was a research scientist in the Department of Defense for over a decade, working internationally with US (DARPA), UK and Canadian Defense organizations. Don works with and is a mentor to various innovative deep technology and academic-founded start-ups and has received numerous innovation awards.

Together Don and Michelle Founded a global AI company in 2007 which operated out of Silicon Valley and Australia, and was acquired by global firm EY in 2015.

Acknowledgements

Presagen scientific team: Dr Milad Dakka, Dr Tuc Nguyen, Dr Sonya Diakiw

Why Commercial Artificial Intelligence Products Do Not Scale