Facebook and Twitter have left most other companies around the world far behind when it comes to using machine learning to improve their business model.

And while their practices haven’t always resulted in the best reactions from end-users, there’s much to be learned from these companies on what to do–and what not to do–when it comes to scaling and applying data analytics.

Get the Data You Need First

While Facebook seemingly uses machine learning for everything — it is used for content detection and content integrity, sentiment analysis, speech recognition, and fraudulent account detection, as well as operating functions like facial recognition, language translation, and content search functions.

The Facebook algorithm manages all this while offloading some computation to edge devices in order to reduce latency.

This allows users with older mobile devices (more than half of the global market) to access the platform more easily. This is an excellent tactic for legacy systems with limited computing power which can then use the cloud to handle the torrent of data.

Cloud-based systems can also be improved through the introduction of accessible metadata that will customize, correct, and contextualize real-world data.

Start by thinking about what data is really needed, and which of those datasets are most important. Then start small. Too often, teams get distracted in the rush to do it now and do it big.

But this mindset can actually be confused for the real objective: do it right. Focus on modest efforts that work, then increase the application development to apply to more datasets or to adapt more quickly to changing parameters.

By focusing on early success and scaling upwards, early failures that may occur due to too much data too quickly can be avoided altogether. Even if a failure does happen, the momentum of smaller successes will propel the project forward.

Automate Training

Machine learning requires ongoing modification and training to remain fresh. Both Twitter and Facebook use Apache Airflow to automate training that keeps the platforms updated, sometimes on hourly cycles.

The amount and speed of retraining will rely heavily on computing costs and the availability of resources. However, ideal algorithm performance will rely on properly scheduled training for the dataset.

One of the biggest challenges may be choosing the type of learning to employ for the AI model. While deep learning methods have been the first choice for dealing with large datasets, it’s possible classic tri-training may create a strong baseline that will outperform deep learning, at least for neuro-linguistic programming.

While tri-training cannot be fully automated, it may produce higher quality results through the use of diverse modules and democratic co-learning.

Pick the Right Platform

One of the challenges both Twitter and Facebook now face is trying to standardize their initially unstructured approach to building frameworks, pipelines, and platforms. Facebook now relies heavily on Pytorch and Twitter uses a mix of platforms, moving from Lua Torch to TensorFlow.

Look for a platform that will be scalable and think of the long-term needs of the company in order to successfully choose the right AI tool.

Don’t Forget the End-User

A search for ‘machine learning’ and ‘Facebook’ together inevitably brings up hundreds of blog posts and articles on the negative feelings some users have about the AI feature built into the site. Loss of privacy, data mining, and targeted advertising are some of the less worrying accusations thrown at the company.

And yet many of the same users appreciate other AI tools that allow them to connect to friends and family in other countries who do not speak their language and tools that keep the platform free from pornography and hate speech (if somewhat imperfectly.)

It was not the technology itself but the lack of transparency and how Facebook implemented machine learning on its platform that frustrated users and militarized some against it.

Don’t make the same mistake. Trust and transparency should be keywords for all major decisions. End-users will appreciate it, and they will leave a well-designed site with the sense they have gained something from the interaction instead of feeling personally violated by it.

Originally published here.