cft

A/B/C Tests: How to Analyze Results From Multi-Group Experiments

In this article, I will walk through the intuition behind one-way ANOVA and how to use it to analyze your results from an experiment with multiple groups.


user

Claudia

2 years ago | 6 min read

Explanation of Analysis of Variance (ANOVA)

Experimentation is widely used at tech startups to make decisions on whether to roll out new product features, UI design changes, marketing campaigns and more, usually with the goal of improving conversion rate, revenue and/or sales volumes.

Oftentimes, we want to test the effect of one change (treatment group) against the status quo (control group), but what if we are considering several options and want to conduct an experiment with more than 2 groups?

In this article, I will walk through the intuition behind one-way ANOVA and how to use it to analyze your results from an experiment with multiple groups.

Why You Should Not Do Multiple t-tests

You may wonder whether you can use t-tests to compare between pairs of groups in your experiment. The short answer is no, you should not do this! Not only will it get tedious, but if you use the t-test to compare between many pairs, your chances of committing a Type I error (false positive) increases.

To illustrate this point, consider an experiment with 5 distinct groups. In this case, the number of t-tests we would need to run is 5 Choose 2 = 10 t-tests! If we perform these 10 t-tests, the probability of a Type I error is no longer 5% even if we set the level of significance to 0.05 for each t-test.

Instead, the probability of a Type I error increases to 1 — (1 — 0.05) ^ 10 = 0.40, since we are doing ten comparisons and the probability of not obtaining any significant results across all tests is now 0.95 ^ 10 = 0.60. This means that if we did 10 t-tests to compare means between our 5 groups in the experiment, we would end up incorrectly rejecting the null hypothesis 40% of the time, instead of 5% of the time!

About ANOVA

ANOVA stands for Analysis of Variance and is a test for statistical significance of differences among the means of two or more groups.

In essence, it partitions the total variance in the data into different categories, allowing us to compare the amount of variability between the means of the groups with the amount of variability between individual observations within each group.

Assumptions

Before conducting an ANOVA test, we need to make sure that the following three assumptions are met:1. The underlying population that our samples are drawn from is normally distributed. One way to check this is to plot a histogram of the observations to visualize if the distribution appears to be normally distributed or not.

2. The variances of the populations that the samples are drawn from are equal. We can check this with a boxplot to see if the variances of the observations appear to be equal or not.

3. The observations in each group are independent of each other, and the observations within each group are sampled at random. We need to ensure that the experiment is well-designed to ensure independence between groups and random sampling within groups.

Hypothesis Test

In this section, I will go through the steps needed to calculate the test statistic and perform our hypothesis test. In a one-way ANOVA, the null hypothesis is that the means of all groups are equal, whereas the alternative hypothesis is that at least one of the means is different.

There are many calculators online and statistical packages in R and Python that you can use to perform the calculations, but the details in the steps below are intended to help you understand the methodology and concepts. Don’t worry if the formulas seem scary, instead it is more important to understand the intuition at each step!

  1. Calculate Sum of squares

In a simple ANOVA, we can think of the total sum of squares (SS_T) as the sum of the sum of squares within groups (SS_w) and sum of squares between groups (SS_B):

The sum of squares within groups (SS_w) represents the amount of variability within groups but is not affected by overall differences between groups. The sum of squares within groups is equal to the sum of squared deviations for all observations within a group from their group mean, summed across all groups.

The sum of squares between groups (SS_B) can be thought of as the differences between groups but is not affected by the amount of variability within groups. It can be calculated as the sum of squares between the group means from the overall grand mean.

2. Calculate degrees of freedom

Next, we need to compute the degrees of freedom. Similar to sum of squares, the total degrees of freedom is equal to the sum of the degrees of freedom within groups (df_w) and the degrees of freedom between groups (df_B):

The degrees of freedom within groups (df_w) is the number of observations - number of groups, while the degrees of freedom between groups (df_B) is the number of groups - 1.

3. Calculate Mean Squares

In order to be able to compare the sum of squares, we divide each by their associated degrees of freedom (df) to obtain the mean squares.

4. Calculate F statistic

The test (F) statistic is the ratio of the mean squares between groups (MS_B) and mean squares within groups (MS_w)

5. Find critical value of F

Determine what the critical value of F is from a F-distribution table for the corresponding values of alpha, degrees of freedom within groups(N - k) and degrees of freedom between groups (k - 1).

6. Conclusion of ANOVA test

If the obtained value of F (F_obt)from your experiment is larger than the critical value of F (F_crit), we can reject the null hypothesis that there is no difference among means. This means at least one of the means is statistically significantly different from one or more of the other means.

Comparing Pairs of Means: Scheffé Method

Being able to reject the null hypothesis of the ANOVA test tells us that at least one of the means is significantly different, but how do we know which of the means are different?

Recall from our example above on conducting multiple t-tests that this would lead to increased Type I error.

However, If we are able to reject the null hypothesis of no difference among means using ANOVA, we can use something called the Scheffé Method of Post Hoc Analysis to compare any pair of means without fear of increasing Type I error.

We can calculate a statistic known as C using the Scheffé method. The formula to calculate obtained C from our experiment is:

where x̄₁ and x̄₂ are the means of the two groups being compared, N₁ and N₂ are the number of observations in these two groups, and MS_w is the within-group mean square from your ANOVA.

We could then compare the obtained value of C from our observations to the critical value of C. The critical value of C can be found from the following formula:

where K is the number of groups, F_crit is the critical value of F that you can look up from a F-distribution table for the corresponding degrees of freedom and significance level.

If the obtained value of C is greater than the critical value of C, you can reject the null hypothesis that the pair of means is equal.

Strength of Association: Omega Squared

The test statistic from ANOVA can tell us whether the difference in means of the groups were statistically significant, but nothing about the effect size. To determine the strength of the treatment effects, we can use the formula below to find omega squared (ω2):

where SS_B is the sum of squares between groups, K is the number of groups, MS_w is the mean square within groups, and SS_T is the total sum of squares.

Omega squared (ω2) tells us how much of the total variability can be accounted for by the treatment effects, and can take any values between -1 to +1, with 0 indicating no effect at all.

Conclusion

I hope this article helped you to better understand what ANOVA is and how to use it to interpret results from multivariate experiments. Below are some key takeaways:

  • ANOVA is a test for statistical significance of differences among the means of two or more groups.
  • If you are able to reject the null hypothesis of no difference among means using ANOVA, you can use the Scheffé Method of Post Hoc Analysis to compare any pair of means without fear of increasing Type I error.
  • To determine the strength of association for the treatment effects, you can calculate Omega squared (ω2), which can tell you how much of the total variability can be accounted for by the treatment effects.

Experiments take time and resources to design, set up and deploy to production. Hopefully this helps you expand your testing toolkit into multivariate experiments!

Originally published here!

Upvote


user
Created by

Claudia


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles