Federated Learning: The Future of Scientific Collaboration

A huge gap between those who have the data and those who are analyzing it


Kevin Wang

3 years ago | 4 min read

Our current state of scientific collaboration is far from efficient.

In this age of data-driven technologies, data analysis is not easy.

Sure, it sounds easy: you get the data, write the code, and then perform your analysis. But there are many, many regulations concerning data privacy and security that bureaucratize this process into a never-ending nightmare.

There’s a huge gap between those who have the data and those who are analyzing it. (I’m sure that if you’ve ever done work with data analysis, especially in healthcare, you know what I’m talking about.)

Let’s take a look at a fairly standard situation where two companies are trying to collaborate; one has the data, and the other has the tools for analysis e.g. a pharma company with tons of data in failed clinical research and a biotech company with a novel ML algorithm.

Both companies are trying to work together, but in order to do so, they need trust. The biotech company needs to trust the pharma company to not misuse their algorithm, and the pharma company needs to trust the biotech company to not abuse and forsake the privacy of that data.

But the standard process that most companies currently use does not implement trust anywhere in the process.

These companies send data over firewalls. There’s literally a big wall between these two companies that are trying to work together. This is the opposite of trust. And because there’s this big wall, both companies have no idea what the other is doing on the other side.

This lack of trust creates the heavy bureaucratic process that many companies have to go through, and it hampers scientific collaboration.

We can solve this problem with something called federated learning.

Federated Learning is a distributed machine learning method that enforces data privacy while still enabling results. In our example, we can use federated learning as a bridge between the biotech company and the pharma company so that the pharma company does not have to worry about the biotech company abusing its data and vice versa.

Federated learning basically introduces a third party that analyzes data at a local level and then sends the results of that analysis to a global server. This way, the individual aspects of that data are protected and not abused.

Google defines it as:

“a specific category of distributed machine learning approaches which trains machine learning models using decentralized data residing on end devices.”

With Federated Learning, we can create a platform that enables efficient scientific collaboration. Data-possessing organizations can upload their data to the platform to license, and then data-analyzing organizations can implement their algorithms without ever actually having to see the content.

The privacy of the data is kept intact, and results are still achieved. Both parties are able to make this transaction without worrying about the other abusing their trust.

Other applications of federated learning include autonomous vehicles, personalized AI assistants, and almost anything that enhances personal user experience.

Secure AI Labs is a company that created the platform described above to empower scientific collaboration. SAIL’s vision is to create a world where scientific collaboration is efficient and easy, where you don’t have to jump through a million hoops to get results.

For example, a company with Alzheimer’s data could upload their data to the SAIL platform. A couple days later, they are contacted by a biotech company that wants to analyze that data.

The first company is able to license their data to the biotech company, and through the SAIL platform, the biotech company is able to analyze that data without ever seeing the content. Something that might normally take up to a week could take 10 minutes!

This Thursday, I actually interviewed Anne Kim, the CEO and co-founder of Secure AI Labs for TKS Talks Boston.

The Knowledge Society is a global teenage accelerator that aims to teach teens about exponential technologies and to impact billions. TKS Talks is an initiative by TKS to educate teens all over the world about exponential technologies and tech leaders.

The interview was really fun, and I learned so much. Anne’s an incredibly authentic (and funny) person, and it was amazing to learn about how SAIL uses federated learning, open algorithms, and secure enclaves to empower scientific collaboration.

SAIL’s Encryption Technology

Like I stated above, SAIL uses federated learning to create a platform where companies can upload their data for analysis while preserving privacy for more efficient scientific collaboration. This entire platform is protected by something called a secure enclave.

Secure enclaves are able to almost guarantee the security of the data. Secure enclaves are physical hardware protecting a CPU from malicious attacks.

Because they are physically implemented, hackers can’t gain access through virtual means. These secure enclaves additionally create something called a Trusted Execution Environment (TEE). A TEE is an isolated environment in which only cryptographic operations are used.

This way, even if a user was guaranteed access and rights, they would still have to decrypt the operations within the TEE.

Future Applications

Federated learning and SAIL could unlock key scientific discoveries in the future. SAIL has tremendous applications in drug discovery, on-device diagnosis, cheminformatics, and genomics. Companies no longer have to worry about abuses of trust, and the current rate of scientific collaboration could become 10xed.

In my interview, I asked Anne where she imagined SAIL would have the most impact. She said besides the easy answer of drug discovery, SAIL would have the greatest impact on rare diseases. Because there is not much data on rare diseases, it’s difficult to gain access to such data without running clinical trials yourself.

On a federated learning platform, this data can be available to everyone, increasing the amount of people we can get working on these hard problems and thus exponentially increasing the time it takes to solve them.

Thanks so much for reading! I hoped you learned something. If you ever want to connect or talk, reach me at


Created by

Kevin Wang







Related Articles