The following is an example of A/B testing using R. This is for training purposes only and the example dataset is taken from https://github.com/etomaa/A-B-Testing/blob/master/data/Website%20Results.csv.
Split testing is another name of A/B testing . It’s used online when one wants to test a new feature or a product. This method outlines a repeatable example to infer whether the new feature is worth using or not.
Variant A is from the control group which tells the existing features or products on a website. Variant B is from the experimental group to check the new version of a feature or product to see if users like it or if it increases the conversions(bookings). Converted is based on the data set given, there are two categories defined by logical value. It’s going to show true when the customer completes bookings and it’s going to show false when the customer visits the sites but not makes a booking. Therefore we are looking for an increase or decrease of conversions with Variant B, to see how the new feature performs.
Both version A and B have the same probability of driving customer booking or conversion. In other words, there is no effect or no difference between version A and B.
Both version A and B have a different probability of driving customer booking or conversion. There is a difference between version A and B. Version B is better than A in driving customer bookings. PExp_B != Pcont_A
library(readr)
library(tidyverse)
library(pwr)
ABTest <- read.csv("https://raw.githubusercontent.com/etomaa/A-B-Testing/master/data/Website%20Results.csv")
First we find the current conversion rates of A and B respectively.
# Let's filter out conversions for variant_A
conversion_subset_A <- ABTest %>%
filter(variant == "A" & converted == "TRUE")
# Total Number of Conversions for variant_A
conversions_A <- nrow(conversion_subset_A)
# Number of Visitors for variant_A
visitors_A <- nrow(ABTest %>%
filter(variant == "A"))
# Conversion_rate_A
conv_rate_A <- (conversions_A/visitors_A)
print(conv_rate_A)
## [1] 0.02773925
# Let's take a subset of conversions for variant_B
conversion_subset_B <- ABTest %>%
filter(variant == "B" & converted == "TRUE")
# Number of Conversions for variant_B
conversions_B <- nrow(conversion_subset_B)
# Number of Visitors for variant_B
visitors_B <- nrow(ABTest %>%
filter(variant == "B"))
# Conversion_rate_B
conv_rate_B <- (conversions_B/visitors_B)
print(conv_rate_B)
## [1] 0.05068493
Therefore, the basic conversion rate for set A is 0.028, and for B it is 0.051. This implies an increase in bookings with the new feature but we can go further.
# Testing the success for both sets
control_results <- prop.test(x = conversions_A,
n = visitors_A,
alternative = 'two.sided', conf.level = 0.975)
control_lower_ci <- control_results$conf.int[1]
control_upper_ci <- control_results$conf.int[2]
alternate_results <- prop.test(x = conversions_B,
n = visitors_B,
alternative = 'two.sided', conf.level = 0.975)
alternate_lower_ci <- alternate_results$conf.int[1]
alternate_upper_ci <- alternate_results$conf.int[2]
conv_rate <- c(conv_rate_A,
conv_rate_B)
# And another one for the lower CI value.
# I'm going to round these to 3 decimal places.
lower_ci <- c(round(control_lower_ci, 3),
round(alternate_lower_ci, 3))
# And one for the upper CI value.
# I'm going to round these to 3 decimal places.
upper_ci <- c(round(control_upper_ci, 3),
round(alternate_upper_ci, 3))
# Then add a name for each.
version <- c('A. Control',
'B. Alternative')
# Finally, create the dataframe from the four vectors.
conv_tbl <- data.frame(version, conv_rate, lower_ci, upper_ci)
# Display the results.
conv_tbl
## version conv_rate lower_ci upper_ci
## 1 A. Control 0.02773925 0.016 0.046
## 2 B. Alternative 0.05068493 0.035 0.073
Here we have the different in confidence intervals between set A and B. The booking conversion is far higher in set B, meaning the new feature is beneficial. We can plot this to get a visual representation of these results.
Now we can create a graph comparing the two. I’m going to use an error bar and not highlight the conversion rates. Instead, the focus will be on the error bars and nothing else.
library(scales)
library(ggthemr)
ggthemr("dust")
conv_tbl %>%
ggplot(aes(x=version, y=conv_rate, group=1)) +
geom_errorbar(width=.1, aes(ymin=lower_ci, ymax=upper_ci), colour="black")+
ggtitle('Conversion Rate - Confidence Intervals') +
ylab('Conversion Rate') +
xlab('Version') +
scale_y_continuous(labels = scales::percent)
We can look more in depth into the uplift and the significance of the new feature.
uplift <- (conv_rate_B - conv_rate_A) / conv_rate_A * 100
uplift
## [1] 82.71918
# Pooled sample proportion for variants A & B
p_pool <- (conversions_A + conversions_B) / (visitors_A +
visitors_B)
print(p_pool) # 0.03928325
## [1] 0.03928325
# Let's compute Standard error for variants A & B (SE_pool)
SE_pool <- sqrt(p_pool * (1 - p_pool) * ((1 / visitors_A) +
(1 / visitors_B)))
print(SE_pool) # 0.01020014
## [1] 0.01020014
# Let's compute the margin of error for the pool
MOE <- SE_pool * qnorm(0.975)
print(MOE) # 0.0199919
## [1] 0.0199919
# Point Estimate or Difference in proportion
d_hat <- conv_rate_B - conv_rate_A
# Compute the Z-score so we
# can determine the p-value
z_score <- d_hat / SE_pool
# Let's compute p_value
# using the z_score value
p_value <- pnorm(q = -z_score,
mean = 0,
sd = 1) * 2
# Let's compute Confidence interval for the
# pool using pre-calculated results
ci <- c(d_hat - MOE, d_hat + MOE)
ci <- c(d_hat - MOE, d_hat + MOE)
ci
## [1] 0.002953777 0.042937584
ci_lower <- d_hat - MOE
ci_lower
## [1] 0.002953777
ci_upper <- d_hat + MOE
ci_upper
## [1] 0.04293758
library(tidyverse)
library(extrafont)
2a. Let’s load the data into a dataframe ABTest
for
the analysis:
ABTest <- read.csv("https://raw.githubusercontent.com/etomaa/A-B-Testing/master/data/Website%20Results.csv")
3a. Let’s take a subset of conversions for test_version_A:
conversion_subset_A <- ABTest %>% filter(variant == "A" & converted == "TRUE")
3b. Number of Conversions for test_version_A:
conversions_A <- nrow(conversion_subset_A)
3c.Number of Visitors for test_version_A:
visitors_A <- nrow(ABTest %>% filter(variant == "A"))
3d. Let’s compute conversion rate for test_version_A:
conv_rate_A <- (conversions_A/visitors_A)
conv_rate_A
## [1] 0.02773925
3e. Let’s take a subset of conversions for test_version_B:
conversion_subset_B <- ABTest %>% filter(variant == "B" & converted == "TRUE")
3f. Number of Conversions for test_version_B:
conversions_B <- nrow(conversion_subset_B)
3g.Number of Visitors for test_version_B:
visitors_B <- nrow(ABTest %>% filter(variant == "B"))
3h. Let’s compute conversion rate for test_version_B:
conv_rate_B <- (conversions_B/visitors_B)
conv_rate_B
## [1] 0.05068493
Relative Uplift
uplift <- (conv_rate_B - conv_rate_A)/ conv_rate_A * 100
uplift
## [1] 82.71918
Deduction: B is better than A by 82.7%. This is high enough to decide a winner but we need more supporting evidence before we can arrive at a final decision.
Pooled Probability for Test Versions A & B
5a. Let’s compute pooled probability:
p_pool <- (conversions_A + conversions_B)/(visitors_A + visitors_B)
p_pool
## [1] 0.03928325
5b. Let’s compute the standard error (SE_pool) for the pool:
SE_pool<- sqrt(p_pool*(1-p_pool) * ((1/visitors_A) + (1/visitors_B)))
SE_pool
## [1] 0.01020014
5c. Let’s compute the margin of error for the pool:
MOE <- SE_pool * qnorm(0.975)
MOE
## [1] 0.0199919
5d. Point Estimate or Difference in proportion:
d_hat <- conv_rate_B - conv_rate_A
d_hat
## [1] 0.02294568
5e. Compute the Z-score so we can determine the p-value:
z_score <- d_hat/SE_pool
z_score
## [1] 2.249546
P-Value for Test Version A & B
5f. Let’s compute p_value using the z_score value:
p_value <- pnorm(q = -z_score, mean = 0, sd = 1) * 2
p_value
## [1] 0.02447777
5g. Let’s compute Confidence interval for the pool
ci <- c(d_hat - MOE, d_hat + MOE)
ci
## [1] 0.002953777 0.042937584
ci_lower <- d_hat - MOE
ci_lower
## [1] 0.002953777
ci_upper <- d_hat + MOE
ci_upper
## [1] 0.04293758
6a. Let’s compute Standard Error and Confidence Intervals for Test version A separately
X_hat_A <- conversions_A/visitors_A
X_hat_A
## [1] 0.02773925
se_hat_A <- sqrt(X_hat_A*(1-X_hat_A)/visitors_A)
print(se_hat_A)
## [1] 0.006116051
ci_A <- c(X_hat_A - qnorm(0.975)*se_hat_A, X_hat_A + qnorm(0.975)*se_hat_A)
ci_A
## [1] 0.01575201 0.03972649
6b. Let’s compute Standard Error and Confidence Intervals for Test version B separately
X_hat_B <- conversions_B/visitors_B
X_hat_B
## [1] 0.05068493
se_hat_B <- sqrt(X_hat_B*(1-X_hat_B)/visitors_B)
print(se_hat_B)
## [1] 0.008118638
ci_B <- c(X_hat_B - qnorm(0.975)*se_hat_B, X_hat_B + qnorm(0.975)*se_hat_B)
ci_B
## [1] 0.03477269 0.06659717
6c. *Let’s extract the lower and upper confidence intervals into
lower
and upper
respectively
ci_lower_A <- X_hat_A - qnorm(0.975)*se_hat_A
ci_lower_A
## [1] 0.01575201
ci_upper_A <- X_hat_A + qnorm(0.975)*se_hat_A
ci_upper_A
## [1] 0.03972649
ci_lower_B <- X_hat_B - qnorm(0.975)*se_hat_B
ci_lower_B
## [1] 0.03477269
7a. Let’s create a dataframe of results for the pool
vis_result_pool <- data.frame(
metric=c(
'Estimated Difference',
'Relative Uplift(%)',
'pooled sample proportion',
'Standard Error of Difference',
'z_score',
'p-value',
'Margin of Error',
'CI-lower',
'CI-upper'),
value=c(
conv_rate_B - conv_rate_A,
uplift,
p_pool,
SE_pool,
z_score,
p_value,
MOE,
ci_lower,
ci_upper
))
vis_result_pool
## metric value
## 1 Estimated Difference 0.022945680
## 2 Relative Uplift(%) 82.719178082
## 3 pooled sample proportion 0.039283253
## 4 Standard Error of Difference 0.010200138
## 5 z_score 2.249546089
## 6 p-value 0.024477774
## 7 Margin of Error 0.019991903
## 8 CI-lower 0.002953777
## 9 CI-upper 0.042937584
7b. Let’s create a dataframe of results per test version
vis_result <- data.frame(variant = c("A","B"), visitors = c(visitors_A, visitors_B),
conversions = c(conversions_A,conversions_B),conversion_rate = round(c(conv_rate_A, conv_rate_B),4),
Standard_error = round(c(se_hat_A,se_hat_B),5), Conf_Interval_A = c(ci_A[1],ci_A[2]))
vis_result
## variant visitors conversions conversion_rate Standard_error Conf_Interval_A
## 1 A 721 20 0.0277 0.00612 0.01575201
## 2 B 730 37 0.0507 0.00812 0.03972649
8a. Let’s create the dataframe
ABTdata = data.frame(
variant = factor(rep(c("A", "B"), each=200)),
conf_interval = c(rnorm(200, 0.02773925, 0.005891128), rnorm(200, 0.05068493, 0.007793573))
)
head(ABTdata, 4)
## variant conf_interval
## 1 A 0.02235077
## 2 A 0.02049335
## 3 A 0.03216455
## 4 A 0.02444700
8b. Let’s create a sub-dataframe we will call in our plot
pe = data.frame(variant = c("A","B"), conversion_rate = round(c(conv_rate_A, conv_rate_B),4), lower_confidence = round(c(ci_lower_A,ci_upper_A),4))
pe
## variant conversion_rate lower_confidence
## 1 A 0.0277 0.0158
## 2 B 0.0507 0.0397
8c. #Let’s plot the distributions
a <- ggplot(ABTdata, aes(x = conf_interval))
a + geom_density(aes(fill = variant), alpha = 0.3) +
geom_vline(aes(xintercept = lower_confidence[1], color = variant),
data = pe, linetype = "dashed") +
geom_vline(aes(xintercept = lower_confidence[2], color = variant),
data = pe, linetype = "dashed") +
geom_vline(aes(xintercept = conversion_rate, color = variant),
data = pe, linetype = "dashed") +
scale_color_manual(values = c("#b8b7b8", "#b8b7b8"))+
scale_fill_manual(values = c("#e9e8e9", "#4b779d")) +
annotate(geom="text", x= 0.0507, y=29,
hjust = 0, # zero left justifies the text annotation horizontally. 0.5 is center, 1 is right justified.
vjust = 0.5, # > +0.1 range pushes the text annotation down vertically
#alpha = 0.2, #alpha reduces the transparency of the text
label="CR B: 5.06%",
fontface = "bold",
colour = "black",
angle = 90,
family="Century Gothic",
size = 4.0) +
annotate(geom="text", x= 0.0277, y=35,
hjust = 0, # zero left justifies the text annotation horizontally. 0.5 is center, 1 is right justified.
vjust = 0.5, # > +0.1 range pushes the text annotation down vertically
#alpha = 0.2, #alpha reduces the transparency of the text
label="CR A: 2.77%",
fontface = "bold",
angle = 90,
colour = "black",
family="Century Gothic",
size = 4.0)+
annotate(geom="text", x= 0.0403, y=24,
hjust = 0, # zero left justifies the text annotation horizontally. 0.5 is center, 1 is right justified.
vjust = 0.5, # > +0.1 range pushes the text annotation down vertically
#alpha = 0.2, #alpha reduces the transparency of the text
label="95%",
#fontface = "bold",
angle = 90,
colour = "black",
family="Century Gothic",
size = 4.0)+
annotate(geom="text", x= 0.0164, y=24,
hjust = 0, # zero left justifies the text annotation horizontally. 0.5 is center, 1 is right justified.
vjust = 0.5, # > +0.1 range pushes the text annotation down vertically
#alpha = 0.2, #alpha reduces the transparency of the text
label="95%",
#fontface = "bold",
angle = 90,
colour = "black",
family="Century Gothic",
size = 4.0)+
labs(title = "Expected Distributions of Variants A and B",
x = "Confidence Interval",
y = "",
caption = "Etoma Egot") +
theme_classic() +
theme(legend.position = "none") +
theme(axis.ticks.y = element_blank(), #remove y axis ticks
axis.title.y = element_blank()) + #remove y axis title
theme(plot.title = ggplot2::element_text(family="Century Gothic",
size=20,
#face="bold",
color="#002849")) +
theme(plot.subtitle = ggplot2::element_text(family="Century Gothic",
size=16,
face="bold",
color = "#4b779d"))+
theme(panel.grid.major.y = ggplot2::element_line(color="#FAFAFA"))
ABTdata_pool = data.frame(Conf_interval = rnorm(n = 10000, mean = 0, sd = SE_pool))
#Let's plot the distributions
b <- ggplot(ABTdata_pool, aes(x = Conf_interval))
b + geom_histogram(aes(y = ..density..),
colour="black", fill="white", binwidth = 0.00029) +
geom_density(alpha = 0.2, fill = "#FF6666")+
theme_classic()