← Back to blog

MAB vs. A/B Testing: Choosing the Right Algorithm for Growth

TL;DRIn product optimization, multi-armed bandit algorithms (MAB) outperform serial A/B testing (SAB) by requiring significantly less traffic to identify winning variants and delivering a higher overall conversion rate (CVR) uplift. Hence, Levered uses specialized, contextual bandit algorithms to efficiently optimize apps and websites. In this article, we explore why MAB algorithms outperform traditional A/B testing through a simulation of a typical website optimization scenario.

Growth optimization vs. product testing

Incremental product growth optimization is inherently different from traditional product work. That's also why companies such as Meta have dedicated growth orgs that are separate from the core product organization.

In “growth optimization”, testing velocity is key. It differs from classic feature testing in several ways: smaller changes (adjustments are minor and inexpensive, such as changing text or design elements), more subtle effects (most changes are low-risk with relatively small effect sizes, yet the cumulative effect can be large), and lower success rates (few changes are successful, most don't bring a statistically significant conversion-rate improvement).

As a result, success in growth optimization hinges on a team's ability to identify and accumulate many small wins, whereas core product work focuses more on de-risking bigger bets.

Limitations of classic A/B testing

In the context of growth optimization, A/B testing often falls short due to the large sample size required and the rigidity of the statistical approach:

Multi-Armed Bandits

Multi-armed bandits are not new. They have been used successfully in areas such as search or ads optimization. However, they are much less commonly used in product optimization compared to A/B testing.

MAB algorithms balance two competing goals: exploration and exploitation. Exploration aims at testing as many ideas as possible, while exploitation aims to maximize the overall conversion rate by showing winning ideas to as many users as possible.

An effective technique to balance exploitation and exploration is called “Thompson Sampling.” Simply put, it works like this: start with a guess (assume all options are equally good), test one variation (pick one option randomly, show it to a user, and see how they respond), then update your guess (adjust your belief based on the observation).

MAB algorithms are particularly data-efficient when paired with “hierarchical Bayesian” models. These models recognize that a product design is a function of multiple variables (“factors”) that may vary in importance. The hierarchical approach offers a significant advantage over A/B testing, since it helps avoid wasting traffic on finding the best levels of unimportant factors.

Benchmarking algorithms

At Levered, we use custom hierarchical MAB algorithms for automated product growth optimization. The best way to directly compare the two approaches is by running a simulation. We define a typical product optimization scenario and observe how effective each algorithm is in finding “winners” and improving the overall conversion rate over time.

Algorithm comparison: Sequential AB-Testing vs Multi-Armed-Bandit
Figure 1: Algorithm comparison

Defining the scenario

We are optimizing a UX across three variables (“factors”), e.g., the headline, hero image, and CTA copy of a landing page. For each factor, we want to explore four different levels. This makes 4×4×4 = 64 possible variants. The three different factors vary in importance and each variant has a “true” conversion rate between 2% and 4% (both ex-ante unknown).

Optimization scenario with three factors and four levels each
Figure 2: Optimization scenario

Quantifying performance

We focus on three characteristics:

Conversion uplift of MAB (orange) and AB-testing (blue)
Figure 3: Conversion uplift of MAB (orange) and A/B testing (blue)

Results

We observe that on average, MAB:

Discussion

When it comes to testing expensive changes, such as new feature launches, A/B testing is still a valid approach. The larger the company and the more traffic the product gets, the more likely it is that A/B testing will work out fine.

However, when the space of options is large, traffic is sparse, and the potential cost of experimentation is moderate, MAB is the superior alternative. This applies particularly in the context of CRO in small and medium enterprises.