# So how do we do that!?

Ido Zehori

3 years ago | 4 min read

Conditional Probability
A good ML model needs to not only be able to identify the obscure qualities of high LTV users but also output a “user score” that is precisely proportional to that quality. When predicting whether or not a user is going to have a high LTV, the model should calculate the following at the time of an auction:

• The probability that that user is going to click on the ad
• The probability that that user will install given a click
• The probability that that user will not churn immediately or after one use of the app
• The probability that the user will make a purchase or see many ads
• The probability that the user will make repeated purchases

This is called conditional probability, and as we advance down the funnel, conditional probabilities become very, very, low. For example, if you were to guess at random whether a single user is going to have a high LTV from the entire population of mobile users, you would be correct roughly once every 80,000 times. My team’s machine learning models are able to bring this probability up to about 1:150.

Hide and Go Seek
Digital Advertising is the ultimate Machine Learning playground. It combines data-rich activities, scaling challenges, immediate feedback on actions taken by the algorithm, high quality data, and an abundance of signals that can predict different outcomes.

While Digital Advertising is the ultimate playground for ML, targeting high LTV users is the ultimate game of playground hide and go seek. To play, your ML needs to have these three primary characteristics:

• Clean and QA’d data and enough of it
• A good, strong set of features that accurately explains the target variable
• A model training setting that aims at answering specific questions

So how do we do that!?
First, like an elite athlete that needs reps, we train our ML models on massive amounts of data. The more data, the more accurate we can be. From basic data, like app open and app installs, to session depth and purchase value, it all helps our ML be more ready for more scenarios. Then, with Real-Time Data Analysis, we keep all of our data fresh for the moment it’s needed.

Having robust, highly trained, ML technology is key. My team can say something meaningful about more than 1.5 billion devices in the US alone at any moment.
Every auction that comes in the system goes through our committee of models, each looking at the problem from a slightly different angle. Some examples of the questions our various models are trying to answer are:

• What is the probability of a click?
• What is the probability of an install given a click?
• What is the expected LTV of this user?
• Is the user in a setting that maximizes the probability of engagement with the ad?
• What is the best creative to show the user?
• How much should we bid to show the user an ad?

As well as spotting issues, such as:

• What is the probability that the auction is fraudulent?
• What is the probability that there will be a misattribution?
• Are there broken creatives?

Featuring Features
Our ML system produces a set that ranges from hundreds to thousands of features for each impression we show to help answer the questions outlined above, as well as many others. Each feature revolves around:

• The affinity of the users
• User trends over time
• A series analysis of the market (baseline CTR in the exchange)
• Baseline prevalence of high spenders in the stream
• The environments the users are in. For example, in which Deep Categories we see them.
• Other factors we don’t want our competition to read about here.

A harmonious committee of Our ML models eventually produces a quality score for the user in the specific state that the user is in. Finally, we tailor the right ad with the right bid price to deliver high LTV users to our client’s app(s). There’s nothing easy about it, but our mission is to continually evolve our ML technology to make it sound easy.

A Brief History of Real-Time
Truly, of all the advancements in modern advertising, few are more exciting than machine learning as applied to targeting high LTV users. In the past, ML wasn’t predominant in digital advertising due to its state of maturity and lack of tools.

But as more and more DSPs are adapting programmatic systems, it’s becoming more and more difficult for “old school” players to compete. The complexity and sheer volume of data to go over is simply too vast. Important subtleties and insights would inevitably get lost in the process.

As the ML industry and the development of new big data tools has leaped forward, more and more of those tools have found their way into our own models. Bigabid is pragmatic to the core, we invent or adopt depending on our demands.

For example, advancements in NLP (natural language processing) in recent years have helped us build models that scan massive amounts of text and extract meaningful insights to make real-time decisions.

In addition, recent advancements in computer vision have allowed us to create extremely accurate models for creative personalization to maximize CTR, and therefore ROAS.

Positive Feedback Loop Between Your ROAS and Our Technologies
As you can see, real-time data analysis adds an exciting performance-enhancing element to user acquisition. Targeting in real-time based on session features and Deep Categories greatly improves our pricing, and therefore, your ROAS.

Our devotion to the constant evolution and refinement of our technologies pushes us to create faster and smarter ML. Fortunately, that devotion returns good business, which in turn, drives our technology to push the envelope even further. It’s a positive feedback loop that we enjoy being part of.

Upvote

Created by

Ido Zehori

Data Science Team Leader @ Bigabid. Creating real business impact with Data Science and Machine Learning.

Post

Upvote

Downvote

Comment

Bookmark

Share

Related Articles