Assume you’ve a website or a mobile app for your product or service. You have in mind a change to the user journey that hopefully will positively affect an important metric like conversion rate or click through rate etc. How do you know whether said change should be incorporated?
Performing an A/B test of the change will allow us to determine whether incorporating the change is worthwhile. Here, we show a simple illustration of this process, where we:
In reality, the bulk of the complexity mainly down to the business aspects of A/B testing, which requires coordination between the development, app design, & marketing teams. This is because a lot of the technical details of A/B testing are taken care by testing suites, which run the A/A, A/B tests, as well as monitor any spillover effects on guard-rail metrics that we may have defined earlier.
The business discussions might involve pinning down the following:
This A/B test would attempt to determine whether changing the position of a “Reserve Slot”, button, to a different, strategic location on the screen, could bump up the conversion rate.
We start by fixing all of the parameters prior to the start of the experiment by either relying on our experience or after discussions with the necessary stakeholders. In this case, we determine that we’re likely to see a nearly 19% (38% - 17%) uptick in the conversion rate (effect size). Additionally, the alpha & power are set to their typical levels, & we determine an equal split in the proportion of the control & test groups.
The resultant calculation indicates that we need a minimum of 131 samples in order to conduct the test.
We run the test by randomly assigning users of the app into either the control or test groups, with the button location changed to a new, strategic location for the test group. We run the test for a period of 1 month, for the whole of January.
Upon retrieving the data, we see that we’ve captured data for 588 users, well in excess of the minimum 131 samples required.
Perhaps we should revisit our assumptions about the length of time for the experiment, as we could have stopped it sooner. Or perhaps this is due to a sudden influx of new sign-ups. In either case, we need to remember to investigate this after our analysis of this test, as it’ll have a bearing on follow-up experiments.
To account for change in user behavior across the days of the week, we plot the conversion rates for the two groups across all the days of the week.
We see that encouragingly, there’s a clear difference in the conversion rates between the two groups, with the test group having consistently higher conversion rate than the control group.
The conversion rate of the test group averages about 38%, as opposed to the control group average rate of 17%. This is a good sign.
We can rely on the results of this experiment & conclude that moving the button is a worthwhile change, as running a logistic regression on the experiment data shows that the p-values are significant.