Presagen Webinar Series: Connecting global datasets that you never move or see with Decentralized Federated Artificial Intelligence
In Jan 2023 Dr Jonathan Hall presented the webinar “Connecting global datasets that you never move or see: Decentralized Federated Artificial Intelligence”. Presagen’s webinar series presents new technologies and challenges related to AI in healthcare, women’s health and fertility. Below is the transcript of the presentation, the webinar video, the related Nature Scientific Reports paper, and presentation slides.
Slide 1
Hi everyone, Thanks for joining me.
I’m Jonathan Hall from Presagen. Today, I will be talking about how Presagen is changing the way medical data from around the world is safely and privately connected with Artificial Intelligence.
By using our unique Federated AI technology, globally distributed data can be connected together, without ever moving or seeing private data.
This allows us to help clinics and hospitals all over the world to safely collaborate, and together co-create AI healthcare products to improve healthcare outcomes for patients globally.
Slide 2
One of the biggest challenges with AI is bias. When bias exists, then AI cannot scale and reliably work for everyone, including under-served populations, like women or minorities.
There is a solution to the bias and scalability problem. And its… diversity.
Because when you create AI is based on globally diverse data, representing the type of people that visit the most advanced hospitals using the most advanced equipment, to the type of people that visit small local clinics using standard equipment, then AI that you create will be less biased, and will reliably work for everyone, regardless of who you are or where you live.
The problem is, creating AI on a globally diverse dataset is challenging.
Slide 3
In healthcare, protecting the privacy of patient data is critical.
Patients don’t want their personal private data being shared, copied and moved all over the world.
This is justified, and there are laws in many countries that restrict movement of medical data beyond borders.
Clinics themselves have their own privacy policies restricting data movement, or who can see their data, to protect their patients.
However, what this means is that global datasets cannot be centralized to a single location to train AI algorithms in the traditional way.
Slide 4
This makes it challenging for clinics that want to leverage their data, that want to innovate to develop new technology, not only to support their own patients, but to support patients around the world.
The value of a clinic’s data on its own is limited, and as discussed, creates bias.
Adhering to privacy means that data becomes sparse, siloed, disaggregated, and distributed amongst many clinics globally.
So how do we unlock the value of this globally diverse siloed data?
How do we connect this global dataset to AI, without having to move or see the private data.
Slide 5
A solution to this problem is using technology called federated learning.
Federated learning allows AI to be trained on data distributed all over the world, without having to centralize the data.
You don’t need to move, see or even own the data you use to train the AI.
All you need to do is allow the AI to access the data, and learn the general insights from the data.
So rather than the data moving to the AI as with traditional AI training, federated learning works by iteratively moving the AI to data distributed all over the world.
The AI can tour the world, arriving in each region or site, organized so that no data leaves, and the learnings can be shared with all.
Federated learning on globally diverse datasets allows us to create truly scalable AI that works out-of-the-box.
Slide 6
Federated Learning is being used for different kinds of applications.
It’s been used to learn your individual preferences on the phone, or tailoring algorithms to suit your purchasing patterns.
However, existing federated learning algorithms for these applications can still move private information back to a central server, which as discussed, is not allowed in healthcare.
Existing algorithms can also be slow and inefficient.
Additionally, problems that federated learning are typically applied to are at the device level, to tweak an existing AI.
Tweaking algorithms for your phone preferences is a relatively small task.
However, with healthcare, AI training is typically at the institution level.
Creating an entirely new AI on an intensive medical problem, in a way that is not slow, inefficient and cost-prohibitive, and ensures private data remains private and is not shared, requires a whole new approach.
Slide 7
At Presagen, we created unique way of organizing how the AI can learn from data distributed around the world, without moving or seeing the private data.
Our completely decentralized and scalable algorithm makes Federated Learning significantly more cost-effective and practical.
For the first time in history, we have enabled clinics and hospitals around the world to work together and collaborate, and safely connect their data, to create new AI healthcare products from globally diverse datasets.
Test results using our decentralized Federated AI algorithm show outstanding performance, even better than the ideal case where data are centralized and are training using traditional AI training.
Our algorithm has shown increased accuracy, robustness, and ability to handle errors in data.
Slide 8
To learn from separated sites, or data centers, the AI learns in a pattern, or topology, that describes how it will travel around the globe.
Each data center begins with a fresh AI, which will learn from the local data in each region.
Then, the AI is sent to the next region where it remains for a time, then is sent to a further region, each time gaining more knowledge or insights from its learnings.
Like a ‘travelling student’ that learns from each ‘local master’, the AI learns to take the expertise from each place into an overall context, using a process known as Distillation.
This procedure is repeated many times, each starting at a different data center, and the best AIs are distilled together at the end to arrive at a final, robust AI.
Slide 9
To make this even more cost-effective, the data transfer costs of an AI from one region to another can be further minimized.
By allowing the number of times AI are transferred to be configurable, and additionally clustering centers into separate ‘zones’, the AI performance versus cost-requirements can be optimized.
Slide 10
To show how well it works, we show here the results of two different configurations of Federated Learning applied to data in five separated data centers.
On the left, a clean and error-free dataset, which represents an easy machine learning problem, is considered.
These data were public and not subject to data privacy considerations, so that the comparison could be made.
The baseline accuracy is the average accuracy achieved by centralizing the dataset in the normal way for ideal training performance, in the absence of data privacy.
It is clear that the Federated AI performs at least as well as the ideal scenario.
In fact, with even a small amount of optimization, the Federated AI can begin to exceed that of the baseline.
On the right, we use a poor-quality dataset, which more closely matches what we expect to see in in the real-world. In the real world, data is never perfect and contains errors, and in particular medical data contains inherent errors that even medical professionals can’t detect.
Here, the performance increase is even more substantial, showing that our decentralized Federated AI algorithm can learn the context of the distributed data, remove bias or errors, and become more robust than traditionally trained AI.
Slide 11
Our paper in Nature Scientific Reports shows how our Federated AI algorithm can lead to improvements in accuracy of up to 11%, analyzing several different medical and non-medical datasets.
Using Federated Learning to train on real-world medical data, we saw an improvement in the AI performance compared to the centralized baseline.
This improvement, when training on poor-quality data, that is data with inherent errors or misdiagnoses, shows how Federated Learning can learn to avoid bad data, and the traps that lead many AI projects to become biased.
Slide 12
Healthcare is not alone in the challenge to deal with Big Data with data sharing constraints.
Being able to understand the data at a global or population level, and more efficiently share the learnings, could be more valuable than the data itself.
Data compliance in finance, retail, cybersecurity, space and satellite technology and the defense sector plays a crucial role in the safe operation of society, but limits innovation and improvement of services.
Now, for the first time, there is a pathway that allows us to have both, the very best of robust, scalable AI, and all the while respecting our privacy.
Slide 13
I would like to thank you for joining this webinar.
This presentation and Nature Scientific Reports paper on the decentralized federated AI algorithm are available on our web site at Presagen.com.
Thank you.