Designing Information Monitored Trials for Binary Outcomes • impart

library(impart)
library(pwr) # Only required for examples, not for `impart`

Planning The Study

Planning an information-monitored study is similar in many respects to planning a study with a fixed sample size. Investigators must decide on the target of statistical inference, also known as an estimand: for a binary outcome, the risk difference is usually of interest, although investigators may also be interested in a risk ratio (relative risk), or odds ratio. Once the estimand is chosen, decisions must be made about what constitutes a meaningful effect size on the scale of the estimand (e.g. a 10% absolute reduction in risk or a relative risk of 0.75). Finally, the characteristics of the testing procedure must be specified, including the desired Type I Error Rate ( $\alpha$ ), statistical power ( $1 - \beta$ ), and the direction of alternatives of interest (an $s$ -sided test: 1- or 2-sided):

# Universal Study Design Parameters
minimum_difference <- 0.15 # Effect Size: An absolute risk difference of 
alpha <- 0.05 # Type I Error Rate
power <- 0.9 # Statistical Power
test_sides <- 2 # Direction of Alternatives

The amount of data that must be collected depends on the amount of the information in the accruing data, with the information depending on the patterns of associations among variables, the risk of the outcome in the control arm, and degree of missingness. Such information is not always available when studies are being planned in practice.

Fixed sample size designs require investigators make assumptions about the factors affecting precision: when such assumptions are incorrect, studies can be over- or under-powered. Rather than planning data collection until a pre-specified sample size is reached, an information-monitored study continues data collection until the data collected provide enough precision to identify a meaningful difference with appropriate power and control of Type I Error.

Determining the Target Information Level

The information or precision required to achieve power $(1 - \beta)$ to identify a treatment effect $\delta$ with an $s$ -sided test with type I error rate $\alpha$ at the final analysis is given by:

$\mathcal{I}_{F} = \left(\frac{Z_{\alpha/s} + Z_{\beta}}{\delta}\right)^2 \approx \frac{1}{\left(SE(\hat{\delta})\right)^2} = \frac{1}{Var(\hat{\delta})}$

# Determine information required to achieve desired power at fixed error rate
information_single_stage <-
  impart::required_information_single_stage(
    delta = minimum_difference,
    alpha = alpha,
    power = power
  )

information_single_stage
#> [1] 466.9966

For example, detecting a risk difference difference of 0.15 with 90% power and a Type I Error rate of 0.05 using a 2-sided test requires an information level of 466.9965805. Investigators can collect data until the precision (the reciprocal of the square of the standard error) reaches this level, and their analysis will have the appropriate power and Type I error control.

Translating Information into Sample Size

Translating information levels to a sample size requires making some assumption about nuisance parameters, such as the probability of the outcomes in the control arm. The information_to_n_binary_1_to_1 function takes an information level and values of the nuisance parameters, and gives an approximate sample size. Note that this calculation only takes into account the information contained in the observed outcomes: if some outcomes are missing, or if the analysis makes use of information in baseline covariates and intermediate outcomes, this can change the sample size at which the target information level is reached.

# Assume control probability = 0.25
approximate_n_p0_0.25 <-
  impart::information_to_n_risk_difference(
    information = information_single_stage,
    pi_0 = 0.25,
    delta = minimum_difference,
    round_up = TRUE
  )
approximate_n_p0_0.25
#>   n_per_arm n_total
#> 1       200     400

# Compute Power: Fixed Sample Size
pwr::pwr.2p.test(
  h = pwr::ES.h(p1 = 0.25 + minimum_difference, p2 = 0.25),
  sig.level = alpha,
  power = power
)
#> 
#>      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
#> 
#>               h = 0.3222409
#>               n = 202.3787
#>       sig.level = 0.05
#>           power = 0.9
#>     alternative = two.sided
#> 
#> NOTE: same sample sizes

# Assume control probability = 0.42
approximate_n_p0_0.42 <-
  impart::information_to_n_risk_difference(
    information = information_single_stage,
    pi_0 = 0.42,
    delta = minimum_difference,
    round_up = TRUE
  )
approximate_n_p0_0.42
#>   n_per_arm n_total
#> 1       229     458

# Compute Power: Fixed Sample Size
pwr::pwr.2p.test(
  h = pwr::ES.h(p1 = 0.42 + minimum_difference, p2 = 0.42),
  sig.level = alpha,
  power = power
)
#> 
#>      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
#> 
#>               h = 0.3011521
#>               n = 231.7151
#>       sig.level = 0.05
#>           power = 0.9
#>     alternative = two.sided
#> 
#> NOTE: same sample sizes

Note that under specific assumptions about the outcome probability in the control arm are met, the sample size requirements determined using the calculation fixed sample size design (pwr::pwr.2p.test) or the calculation information-monitored design (information_to_n_binary_1_to_1) are similar. The slight differences are due to differences in the approximation (a normal approximation versus the arcsine transformation).

The advantage of an information-adaptive design is the sample size adapts to the information in the accruing data: rather than making assumptions about the outcome probability in the control arm, which could result in an over- or under-powered trial, data collection proceeds until the target information level is met, which ensures adequate power and Type I Error control.

Sequential Analyses in Studies

If the true effect of interest is greater than the minimum meaningful effect $\delta$ , the study may still be overpowered. Conversely, if the true effect is very small, or indicates that the benefits of participating in the study are not commensurate with risks, it may be futile to continue data collection. In such cases, interim analyses of the data can be used to guide more ethical, cost-effective data collection.

Group-Sequential Designs allow investigators to control Type I Error rates when performing pre-specified interim assessments of the differences between groups. Studies can also be stopped early for futility if accruing data suggest that a treatment is ineffective or harmful. The number and timing of analyses must be pre-specified, as well as the rules for stopping for efficacy and futility. The stopping rules are specified using ‘spending functions:’ alpha spending functions define efficacy stopping rules, and beta spending functions define futility stopping rules. For more information on group sequential designs, see the documentation for the RPACT package. This example will utilize the O’Brien-Fleming stopping rules for efficacy and futility.

In contrast to a group sequential design, which performs analyses at pre-specified fractions of the final sample size, an information-monitored study performs analyses when the data collected provide enough precision to identify a treatment effect with the appropriate power and Type I Error. Analyses are conducted when the precision reaches pre-specified fractions of this level of precision.

# Group Sequential Design Parameters
information_rates <-
  c(0.50, 0.75, 1.00) # Analyses at 50%, 75%, and 100% of the Total Information
type_of_design <- "asOF" # O'Brien-Fleming Alpha Spending
type_beta_spending <- "bsOF" # O'Brien-Fleming Beta Spending

The getDesignGroupSequential function in the rpact library can be used to specify the appropriate study design. For example, a two-sided test comparing $H_{0}: \mu_{T} - \mu_{C} = \delta_{0}$ vs. $H_{A}: \mu_{T} - \mu_{C} \neq \delta_{0}$

# Set up group sequential testing procedure
trial_design <-
  rpact::getDesignGroupSequential(
    alpha = alpha,
    beta = 1 - power,
    sided = 2,
    informationRates = information_rates,
    typeOfDesign = type_of_design,
    typeBetaSpending = type_beta_spending,
    bindingFutility = FALSE
  )

For a one-sided test using RPACT, the default boundaries assume users are testing $H_{0}: \mu_{T} - \mu_{C} < \delta_{0}$ vs. $H_{A}: \mu_{T} - \mu_{C} \ge \delta_{0}$ :

# One sided test: Higher values = Better
trial_design_one_sided_upper <-
  rpact::getDesignGroupSequential(
    alpha = alpha,
    beta = 1 - power,
    sided = 1,
    informationRates = information_rates,
    typeOfDesign = type_of_design,
    typeBetaSpending = type_beta_spending,
    bindingFutility = FALSE
  )

plot(
  trial_design_one_sided_upper,
  main = "One-Sided: Higher Z = Superior"
)

$A figure showing efficacy and futility stopping boundaries for a group sequential design with three analyses using a one-sided test where higher values of a test statistic indicate the superiority of treatment to control. These analyses are conducted when information reaches 50%, 75%, and 100% of the target information level. The x-axis has values of 0.50, 0.75, and 100, indicating the information fraction, and the y-axis indicates the non-binding futility boundaries. The efficacy boundaries decrease from left to right, with a boundary of 2.54 at an information fraction of 0.5, and a boundary of 1.72 at an information fraction of 1. The futility boundaries increase from left to right, with a boundary of 0.11 at an information fraction of 0.5, converging with to a boundary of 1.72 at information fraction of 1.$

Users must must manually correct the resulting design boundaries to test $H_{0}: \mu_{T} - \mu_{C} > \delta_{0}$ vs. $H_{A}: \mu_{T} - \mu_{C} \le \delta_{0}$ . Until this correction is made, correct_one_sided_gsd() should be called on the result, which corrects the futility and efficacy boundaries:

trial_design_one_sided_lower <-
  correct_one_sided_gsd(
    trial_design = trial_design_one_sided_upper,
    higher_better = FALSE
  )

plot(
  trial_design_one_sided_lower,
  main = "Corrected One-Sided: Lower Z = Superior"
)

$A figure showing efficacy and futility stopping boundaries for a group sequential design with three analyses using a one-sided test where lower values of a test statistic indicate the superiority of treatment to control. These analyses are conducted when information reaches 50%, 75%, and 100% of the target information level. The x-axis has values of 0.50, 0.75, and 100, indicating the information fraction, and the y-axis indicates the non-binding futility boundaries. The efficacy boundaries increase from left to right, with a boundary of -2.54 at an information fraction of 0.5, and a boundary of -1.72 at an information fraction of 1. The futility boundaries decrease from left to right, with a boundary of -0.11 at an information fraction of 0.5, converging with to a boundary of 1.72 at information fraction of 1.$

Once this issue is resolved, users can specify the direction using the directionUpper argument in other RPACT functions:

directionUpper = TRUE tests $H_{0}: \mu_{T} - \mu_{C} < \delta_{0}$ vs. $H_{A}: \mu_{T} - \mu_{C} \ge \delta_{0}$
directionUpper = FALSE tests $H_{0}: \mu_{T} - \mu_{C} > \delta_{0}$ vs. $H_{A}: \mu_{T} - \mu_{C} \le \delta_{0}$

# One sided test: Higher values = Better
trial_design_one_sided_higher <-
  rpact::getDesignGroupSequential(
    alpha = alpha,
    beta = 1 - power,
    sided = 1,
    informationRates = information_rates,
    typeOfDesign = type_of_design,
    typeBetaSpending = type_beta_spending,
    bindingFutility = FALSE,
    directionUpper = FALSE
  )

# One sided test: Lower values = Better
trial_design_one_sided_lower <-
  rpact::getDesignGroupSequential(
    alpha = alpha,
    beta = 1 - power,
    sided = 1,
    informationRates = information_rates,
    typeOfDesign = type_of_design,
    typeBetaSpending = type_beta_spending,
    directionUpper = TRUE
  )

Adjusting Information for Multiple Analyses

When doing sequential analyses in an information-monitored design, the target level of information must be adjusted:

# Inflate information level to account for multiple testing
information_adaptive <-
  impart::required_information_sequential(
    information_single_stage = information_single_stage,
    trial_design = trial_design
  )
information_adaptive
#> [1] 506.4762

The information required under the specified design is 506.4762081, which is scaled up by the inflation factor mentioned in the summary of the design (1.0845394). This can be retrieved using rpact::getDesignCharacteristics(trial_design).

Including Covariate Information

Appropriately including information from covariates has the potential to increase the precision of analyses, meaning that target information levels can be reached at lower sample sizes, resulting in studies with shorter duration.

# Information Only From Final Outcomes
impart::information_to_n_risk_difference(
  information = information_rates*information_adaptive,
  pi_0 = 0.25,
  pi_1 = 0.25 + minimum_difference,
  round_up = FALSE
)
#>   information pi_0 pi_1 n_per_arm  n_total
#> 1    253.2381 0.25  0.4  108.2593 216.5186
#> 2    379.8572 0.25  0.4  162.3889 324.7779
#> 3    506.4762 0.25  0.4  216.5186 433.0372
# 10% Relative Efficiency Increase from Covariates
relative_efficiency <- 1.1
# Information From Final Outcomes + Covariates
impart::information_to_n_risk_difference(
  information = information_rates*information_adaptive/relative_efficiency,
  pi_0 = 0.25,
  pi_1 = 0.25 + minimum_difference,
  round_up = TRUE
)
#>   information pi_0 pi_1 n_per_arm n_total
#> 1    230.2165 0.25  0.4        99     198
#> 2    345.3247 0.25  0.4       148     296
#> 3    460.4329 0.25  0.4       197     394
# 20% Relative Efficiency Increase from Covariates
relative_efficiency <- 1.2
# Information From Final Outcomes + Covariates
impart::information_to_n_risk_difference(
  information = information_rates*information_adaptive/relative_efficiency,
  pi_0 = 0.25,
  pi_1 = 0.25 + minimum_difference,
  round_up = TRUE
)
#>   information pi_0 pi_1 n_per_arm n_total
#> 1    211.0318 0.25  0.4        91     182
#> 2    316.5476 0.25  0.4       136     272
#> 3    422.0635 0.25  0.4       181     362

The increase in precision from covariate adjustment is never known precisely during the planning of a study. Instead of relying on an assumption about the gain in precision from covariates, investigators can use information monitoring to adapt data collection to the accruing information from both covariates and outcomes.

Encapsulating Study Design

A monitored_design object encapsulates all of the information about an information monitored study that should be fixed at the outset, including the number of analyses, the information fractions at which analyses are conducted, the target level of information, the null value of the estimand of interest, the maximum feasible sample size, and the pseudorandom number generator seed which is used for analyses.

# Initialize the monitored design
monitored_design <-
  initialize_monitored_design(
    trial_design = trial_design,
    null_value = 0,
    maximum_sample_size = 600,
    information_target = information_adaptive,
    orthogonalize = TRUE,
    rng_seed_analysis = 54321
  )

monitored_design
#> $original_design
#> $original_design$trial_design
#> 
#> $original_design$maximum_sample_size
#> [1] 600
#> 
#> $original_design$null_value
#> [1] 0
#> 
#> $original_design$information_target
#> [1] 506.4762
#> 
#> $original_design$orthogonalize
#> [1] TRUE
#> 
#> $original_design$rng_seed_analysis
#> [1] 54321
#> 
#> $original_design$information_fractions
#> [1] 0.50 0.75 1.00
#> 
#> $original_design$information_thresholds
#> [1] 253.2381 379.8572 506.4762
#> 
#> 
#> attr(,"class")
#> [1] "monitored_design"