Designing Information Monitored Trials for Time-to-Event Outcomes • impart

library(impart)

Planning The Study

Planning an information-monitored study is similar in many respects to planning a study with a fixed sample size. Investigators must decide on the target of statistical inference, also known as an estimand: for a time-to-event outcome, there are several outcomes that may be of interest, including the survival probability (SP), the restricted mean survival time (RMST), or hazard ratio (HR). Royston and Parmar ((Royston and Parmar 2011; Royston and Parmar 2013)) provide an overview of RMST, with further practical considerations provided by Eaton, Therneau, and Le-Rademacher (Eaton, Therneau, and Le-Rademacher 2020).

Once the estimand is chosen, decisions must be made about what constitutes a meaningful effect size on the scale of the estimand. Next, the characteristics of the testing procedure must be specified, including the desired Type I Error Rate ( $\alpha$ ), statistical power ( $1 - \beta$ ), and the direction of alternatives of interest (an $s$ -sided test: 1- or 2-sided):

# Universal Study Design Parameters
minimum_difference_hr <- 0.74 # Effect Size: Hazard Ratio
minimum_difference_log_hr <- log(minimum_difference_hr) 
minimum_difference_sp <- 0.15 # Effect Size: Difference in Survival Probability
minimum_difference_rmst <- 1 # Effect Size: Difference in RMST
alpha <- 0.05 # Type I Error Rate
power <- 0.9 # Statistical Power
test_sides <- 2 # Direction of Alternatives

Additionally, if the goal is to assess the difference in survival probability or average time being free of the event (RMST), investigators need to specify a clinically meaningful time interval known as the time horizon ( $\tau$ ).

# Study Design Parameters: RMST & Survival Probability
time_horizon <- 5

For example:

A 26% reduction in the hazard (a hazard ratio of 0.74, or a log hazard ratio of -0.301)
A 15% absolute difference in survival probability at $\tau$ = 5 years
A difference in RMST of 1 year at $\tau$ = 5 years (20% increase in expected time being event-free)

The amount of data that must be collected depends on the amount of the information in the accruing data, with the information depending on the patterns of associations between covariates, event times, and censoring times. Such information is not always available when studies are being planned in practice.

Sample size calculations for a continuous or binary outcome require estimates of nuisance parameters, such as the variance of continuous outcomes, or the risk of the outcome in the control arm. Power for the logrank test depends on the number of events observed in both treatment $(D_{1})$ and control $(D_{0})$ arms (Schoenfeld 1983). Let $D = D_{1} + D_{0}$ denote the total number of events in a trial with an allocation ratio of $r$ participants to treatment for every $1$ participants allocated to control ( $r:1$ randomization). The power to detect a hazard ratio of $\theta_{H_{A}}$ with Type I error $\alpha$ with an $s$ -sided test after observing $D$ events across both treatment arms is:

$(1 - \beta) = \Phi\left( log(\theta_{H_{A}}) \sqrt{ \frac{rD}{(1 + r)^2} } - Z_{\alpha/s} \right)$

Alternatively, to achieve power $(1 - \beta)$ requires observing at least $D$ total events, where:

$D = \frac{(1 + r)^2}{r}\left(\frac{Z_{\alpha/s} + Z_{\beta}}{log(\theta_{H_{A}})}\right)^{2}$

Since the number of events observed depends on the distribution of event times and censoring times, investigators must choose a sample size $N = N_{1} + N_{0}$ and minimum follow-up duration so that at least $D$ events are observed across both treatment arms by the end of follow-up. This is known as an event driven trial: precision and power depend on the number of events observed, with recruitment and follow-up duration adjusted to reach the specified level of precision.

# Number of Events Required:
required_events <- 
  impart::hr_design(
    hazard_ratio = minimum_difference_hr,
    power = power,
    alpha = alpha,
    test_sides = test_sides
  )

required_events
#> [1] 463.575

Investigators would need to adjust the duration of follow-up, total sample size, and recruitment rate in order to observe a total of 464 during the study period. Similarly, one can determine the power attained after a certain number of events are observed.

impart::hr_design(
  events = required_events*(0.5), # Lower number of events observed
  hazard_ratio = minimum_difference_hr,
  alpha = alpha,
  test_sides = test_sides
)
#> [1] 0.6301058

impart::hr_design(
  events = required_events, # Lower number of events observed
  hazard_ratio = minimum_difference_hr + 0.10, # Lower reduction in hazard
  alpha = alpha,
  test_sides = test_sides
)
#> [1] 0.4669334

Determining the Target Information Level

The information or precision required to achieve power $(1 - \beta)$ to identify a treatment effect $\delta$ with an $s$ -sided test with type I error rate $\alpha$ at the final analysis is given by:

$\mathcal{I}_{F} = \left(\frac{Z_{\alpha/s} + Z_{\beta}}{\delta}\right)^2 \approx \frac{1}{\left(SE(\hat{\delta})\right)^2} = \frac{1}{Var(\hat{\delta})}$

Note that $\delta$ may be on different scales, depending on the scale of the estimand of interest. Continuous and binary outcomes dealt with information for a difference in means, while time-to-event outcomes could be assessed on the log hazard ratio, survival probability, or restricted mean survival time scale:

Information: Log Hazard Ratio

# Determine information required to achieve desired power at fixed error rate
information_single_stage_log_hr <-
  impart::required_information_single_stage(
    # Note: Estimand is on log hazard ratio scale
    delta = minimum_difference_log_hr,
    alpha = alpha,
    power = power
  )

information_single_stage_log_hr
#> [1] 115.8938

Information: Survival Probability

# Determine information required to achieve desired power at fixed error rate
information_single_stage_sp <-
  impart::required_information_single_stage(
    # Note: Estimand is on survival probability scale
    delta = minimum_difference_sp,
    alpha = alpha,
    power = power
  )

information_single_stage_sp
#> [1] 466.9966

Information: Restricted Mean Survival Time

information_single_stage_rmst <-
  impart::required_information_single_stage(
    # Note: Estimand is on restricted mean survival time scale
    delta = minimum_difference_rmst,
    alpha = alpha,
    power = power
  )

information_single_stage_rmst
#> [1] 10.50742

For example, 90% power and a Type I Error rate of 0.05 using a 2-sided test would require information exceeding:

115.89 to detect a difference in the log hazard ratio of $\delta_{log(HR)}$ = -0.3 (i.e. a hazard ratio of 0.74)
467 to detect a difference in survival probability of $\delta_{SP}$ = 0.15 at $\tau$ = 5 years
10.51 to detect a difference in restricted mean survival time $\delta_{RMST}$ = 1 year at $\tau$ = 5 years

Investigators can collect data until the precision (the reciprocal of the square of the standard error) reaches this level, and their analysis will have the appropriate power and Type I error control. This is known as a single stage design, since data are only analyzed at a single point during the study.

For a binary outcome, there’s only one nuisance parameter related to the outcome distribution: the risk of the outcome in controls. For a continuous outcome, there are two nuisance parameters: the variance of the outcome in each treatment arm. Unfortunately, with time-to-event outcomes, the number of events $D$ depends on the distribution of both event times and censoring in each arm. For survival probability and RMST, the number of events depends on the time after randomization at which they are evaluated: the time horizon $\tau$ .

Sequential Analyses in Studies

If the true effect of interest is greater than the minimum meaningful effect $\delta$ , the study may still be overpowered. Conversely, if the true effect is very small, or indicates that the benefits of participating in the study are not commensurate with risks, it may be futile to continue data collection. In such cases, interim analyses of the data can be used to guide more ethical, cost-effective data collection. These are known as multi-stage designs, as data are analyzed at multiple points in the study.

Group-Sequential Designs allow investigators to control Type I Error rates when performing pre-specified interim assessments of the differences between groups. Studies can also be stopped early for futility if accruing data suggest that a treatment is ineffective or harmful. The number and timing of analyses must be pre-specified, as well as the rules for stopping for efficacy and futility. The stopping rules are specified using ‘spending functions:’ alpha spending functions define efficacy stopping rules, and beta spending functions define futility stopping rules. For more information on group sequential designs, see the documentation for the RPACT package. This example will utilize the O’Brien-Fleming stopping rules for efficacy and futility.

In contrast to a group sequential design, which performs analyses at pre-specified fractions of the final sample size, an information-monitored study performs analyses when the data collected provide enough precision to identify a treatment effect with the appropriate power and Type I Error. Analyses are conducted when the precision reaches pre-specified fractions of this level of precision.

# Group Sequential Design Parameters
information_rates <-
  c(0.50, 1.00) # Analyses at 50% and 100% of the Total Information
type_of_design <- "asOF" # O'Brien-Fleming Alpha Spending
type_beta_spending <- "bsOF" # O'Brien-Fleming Beta Spending

The getDesignGroupSequential function in the rpact library can be used to specify the appropriate study design. For example, a two-sided test comparing $H_{0}: \mu_{T} - \mu_{C} = \delta_{0}$ vs. $H_{A}: \mu_{T} - \mu_{C} \neq \delta_{0}$

# Set up group sequential testing procedure
trial_design <-
  rpact::getDesignGroupSequential(
    alpha = alpha,
    beta = 1 - power,
    sided = 2,
    informationRates = information_rates,
    typeOfDesign = type_of_design,
    typeBetaSpending = type_beta_spending,
    bindingFutility = FALSE
  )

Adjusting Information for Multiple Analyses

When doing sequential analyses in an information-monitored design, the target level of information must be adjusted:

# Inflate information level to account for multiple testing
information_adaptive_log_hr <-
  impart::required_information_sequential(
    information_single_stage = information_single_stage_log_hr,
    trial_design = trial_design
  )

information_adaptive_log_hr
#> [1] 119.3129

The information required under the specified design for the log hazard ratio is 119.3128754, which is scaled up by the inflation factor mentioned in the summary of the design (1.0295022). This can be retrieved using rpact::getDesignCharacteristics(trial_design).

Including Covariate Information

For the log hazard ratio, if only the number of events $D$ is known, along with the sample size in the treatment ( $N_{1}$ ) and control ( $N_{0}$ ) arms, the variance of the log hazard ratio can be approximated by (Tierney, Burdett, and Fisher 2025):

$var(\theta_{log(HR)}) = (N_{1} +N_{0})^2/(DN_{1}N_{0}) = (1 + r)^2/(rD)$ The information is approximately:

$\mathcal{I}_{log(HR)} = DN_{1}N_{0}/(N_{1} +N_{0})^2 = Dr/(1 + r)^2$ This approximation is used in asymptotic_information_logrank which determines the amount of information from an $r:1$ randomized trial:

# Information for 1:1 trial with number of events from Schoenfeld formula
impart::asymptotic_information_logrank(
  allocation_ratio = 1,
  total_events = 
    impart::hr_design(
      hazard_ratio = minimum_difference_hr,
      power = power,
      alpha = alpha,
      test_sides = test_sides,
      ratio = 1
    )
)
#> [1] 115.8938

# Information target based on delta, alpha, power for single stage design
information_single_stage_log_hr
#> [1] 115.8938

relative_efficiency <- c(1, 1.1, 1.2)

adjusted_events <-
  information_to_events_log_hr(
    information = information_single_stage_log_hr/relative_efficiency,
    round_up = TRUE
  )

data.frame(
  relative_efficiency = relative_efficiency,
  information_adjusted = information_single_stage_log_hr,
  setNames(
    object = adjusted_events,
    nm = c("information_unadjusted", "allocation_ratio", "total_events")
  )
)
#>   relative_efficiency information_adjusted information_unadjusted
#> 1                 1.0             115.8938              115.89375
#> 2                 1.1             115.8938              105.35795
#> 3                 1.2             115.8938               96.57813
#>   allocation_ratio total_events
#> 1                1          464
#> 2                1          422
#> 3                1          387

This pre-trial planning can help investigators determine when information thresholds may be reached under different potential gains from covariate adjustment when inferring about the marginal hazard ratio.

Power and sample size software exists for design calculations with the RMST, including the RMSTdesign package (Eaton, Therneau, and Le-Rademacher 2020), and the SSRMST package.

Until the variance of the RMST or survival probability can be approximated by the number of observed events, investigators will have to rely on information monitoring during an ongoing trial.

References

Eaton, Anne, Terry Therneau, and Jennifer Le-Rademacher. 2020. “Designing Clinical Trials with (Restricted) Mean Survival Time Endpoint: Practical Considerations.” Clinical Trials 17 (3): 285–94. https://doi.org/10.1177/1740774520905563.

Royston, Patrick, and Mahesh K. B. Parmar. 2011. “The Use of Restricted Mean Survival Time to Estimate the Treatment Effect in Randomized Clinical Trials When the Proportional Hazards Assumption Is in Doubt.” Statistics in Medicine 30 (19): 2409–21. https://doi.org/10.1002/sim.4274.

Royston, Patrick, and Mahesh KB Parmar. 2013. “Restricted Mean Survival Time: An Alternative to the Hazard Ratio for the Design and Analysis of Randomized Trials with a Time-to-Event Outcome.” BMC Medical Research Methodology 13 (1). https://doi.org/10.1186/1471-2288-13-152.

Schoenfeld, David A. 1983. “Sample-Size Formula for the Proportional-Hazards Regression Model.” Biometrics 39 (2): 499. https://doi.org/10.2307/2531021.

Tierney, Jayne F., Sarah Burdett, and David J. Fisher. 2025. “Practical Methods for Incorporating Summary Time-to-Event Data into Meta-Analysis: Updated Guidance.” Systematic Reviews 14 (1). https://doi.org/10.1186/s13643-025-02752-z.