Skip to contents

impart has several built-in methods for covariate adjustment, but is designed to work with methods from other R packages. This vignette will demonstrate how to integrate other software by writing a wrapper function, and test the resulting function to make sure it works with impart.

Notes for Developers

impart was written to minimize dependencies to avoid the potential problems that may arise when dependencies are deprecated or updated. This includes dependence on common, relatively mature packages such as the tidyverse ecosystem.

While some code works identically for tibbles and base R data.frames, these objects behave differently under vector subsetting - my_tbl_df[, 1] or my_tbl_df[, 1:2] return a tbl_df whereas my_df[, 1] returns a vector and my_df[, 1:2] returns a data.frame. Results of dim, length, nrow, ncol, and other functions may differ between a tbl_df and a data.frame: if this is not handled appropriately, this can result in explicit errors (resulting in a stop) or silently returning erroneous results. The connection between error messages and the cause of the error is often unclear without looking at source code.

For this reason, impart::monitored_analysis checks to ensure that data are passed as a data.frame: a tbl_df can be easily coerced to a data.frame using as.data.frame(). If some function depends on having a tbl_df and is unable to accept a data.frame, the wrapper should include a step to coerce the input back to a tbl_df before being passed to the function for analysis.

Implementing speff2trial::speffSurv() in impart

The speff2trial::speffSurv() in the speff2trial package on CRAN computes a covariate-adjusted marginal hazard ratio. The repo for this package is hosted on Github. This vignette shows how to create a wrapper for this function, which is already integrated in the package as impart::speffsurv_impart().

The impart::colon_cancer dataset will be used to test the wrapper, using the outcome of years_to_death: a time-to-event outcome that is subject to censoring.

library(impart)

# Extract two treatment arms: Lev+5FU (Chemotherapy) and Obs (Observation)
colon_cancer_5fu_vs_obs <-
  subset(
    x = colon_cancer,
    arm %in% c("Lev+5FU", "Obs")
  ) |>
  droplevels()

head(colon_cancer_5fu_vs_obs)
#>   .id     arm age       sex obstruction perforation organ_adherence
#> 1   1 Lev+5FU  43   1. Male       0. No       0. No           0. No
#> 2   2 Lev+5FU  63   1. Male       0. No       0. No           0. No
#> 3   3     Obs  71 0. Female       0. No       0. No          1. Yes
#> 4   4 Lev+5FU  66 0. Female      1. Yes       0. No           0. No
#> 5   5     Obs  69   1. Male       0. No       0. No           0. No
#> 6   6 Lev+5FU  57 0. Female       0. No       0. No           0. No
#>   positive_nodes differentiation local_spread time_surgery_registration
#> 1              5     2. Moderate    3. Serosa                  0. Short
#> 2              1     2. Moderate    3. Serosa                  0. Short
#> 3              7     2. Moderate    2. Muscle                  0. Short
#> 4              6     2. Moderate    3. Serosa                   1. Long
#> 5             22     2. Moderate    3. Serosa                   1. Long
#> 6              9     2. Moderate    3. Serosa                  0. Short
#>   event_death event_recurrence years_to_death years_to_recurrence
#> 1           1                1      4.1642710           2.6502396
#> 2           0                0      8.4517454           8.4517454
#> 3           1                1      2.6365503           1.4839151
#> 4           1                1      0.8021903           0.6707734
#> 5           1                1      1.8042437           1.4318960
#> 6           1                1      4.8377823           2.4750171

The wrapper function should take in all of the necessary arguments to address missing data, compute the estimate, and return the result. The result should either be a numeric vector named estimate, or a list containing a numeric vector named estimate. The value of estimate is extracted by impart::calculate_estimate() to obtain estimates and bootstrap confidence intervals.

First, run the function that requires implementation: see the documentation of the function (e.g. ?speff2trial::speffSurv) for details:

library(speff2trial)
#> Loading required package: leaps
#> Loading required package: survival

speffsurv_example <-
  speff2trial::speffSurv(
    formula =
      survival::Surv(time = years_to_death, event = event_death) ~
      age + sex + obstruction + perforation + organ_adherence + positive_nodes +
      differentiation +local_spread,
    data = colon_cancer_5fu_vs_obs,
    trt.id = "arm",
    fixed = TRUE
  )

The summary or print methods can be helpful in identifying the appropriate value to extract:

summary(speffsurv_example)
#> 
#> Treatment effect
#>             Log HR       SE   LowerCI   UpperCI          p
#> Prop Haz  -0.37281  0.11879  -0.60563  -0.13999  0.0016986
#> Speff     -0.34457  0.10997  -0.56011  -0.12904  0.0017282

Next, inspect the output:

names(speffsurv_example)
#> [1] "beta"       "varbeta"    "fixed"      "conf.level" "method"    
#> [6] "n"

str(speffsurv_example)
#> List of 6
#>  $ beta      : Named num [1:2] -0.373 -0.345
#>   ..- attr(*, "names")= chr [1:2] "Prop Haz" "Speff"
#>  $ varbeta   : Named num [1:2] 0.0141 0.0121
#>   ..- attr(*, "names")= chr [1:2] "Prop Haz" "Speff"
#>  $ fixed     : logi TRUE
#>  $ conf.level: num 0.95
#>  $ method    : chr "exhaustive"
#>  $ n         : 'table' int [1:2(1d)] 315 304
#>   ..- attr(*, "dimnames")=List of 1
#>   .. ..$ : chr [1:2] "Obs" "Lev+5FU"
#>  - attr(*, "class")= chr "speffSurv"

This function returns a list, and the marginal hazard ratio estimates are in a named numeric vector called beta.

An Example Wrapper

A wrapper is needed to take in the appropriate arguments, compute the estimate, and then return a vector named estimate or a list containing that vector.

speffsurv_impart_demo <-
  function(
    data,
    estimand = "log_hazard_ratio",
    formula,
    treatment_column = NULL,
    alpha = 0.05,
    ci = FALSE
  ){
    if(!(estimand %in% c("log_hazard_ratio"))){
      stop("`estimand` must be \"log_hazard_ratio\".")
    }

    if(!(treatment_column %in% names(data))){
      stop("`treatment_column` (", treatment_column, ") must be in `data`.")
    }

    # Check for event indicators that are missing times, vice versa
    outcome_cols <-
      all.vars(update(old = formula, new = . ~ 0))
    
    if(length(outcome_cols == 2)){
      miss_rows <- which(rowSums(is.na(data[, outcome_cols])) == 1)
      if(length(miss_rows) > 0){
        stop("Indicators missing an event time or event times missing an",
             "outcome indicator: rows ", paste0(miss_rows, collapse = ", "))
      }
    }
    
    # Get baseline covariates from formula
    baseline_covariates <-
      all.vars(update(old = formula, new = 0 ~ .))

    # Impute any missing values using mean/mode imputation
    data <-
      impute_covariates_mean_mode(
        data = data,
        baseline_covariates = baseline_covariates
      )

    # Subset to individuals whose outcomes have been assessed:
    speffsurv_result <-
      speff2trial::speffSurv(
        formula = formula,
        data = data,
        trt.id = treatment_column,
        conf.level = 1 - alpha,
        fixed = TRUE
      )

    # Return CI if specified
    if(ci){
      speffsurv_summary <-
        data.frame(summary(speffsurv_result)$tab)
      lcl <- speffsurv_summary["Speff", "LowerCI"]
      ucl <- speffsurv_summary["Speff", "UpperCI"]
    } else {
      lcl = NULL
      ucl = NULL
    }

    return(
      list(
        estimate = as.numeric(speffsurv_result$beta["Speff"]),
        se = sqrt(as.numeric(speffsurv_result$varbeta["Speff"])),
        lcl = lcl,
        ucl = ucl,
        alpha = alpha,
        estimand = estimand
      )
    )
  }

Compare the results of the wrapper to the summary:

summary(speffsurv_example)
#> 
#> Treatment effect
#>             Log HR       SE   LowerCI   UpperCI          p
#> Prop Haz  -0.37281  0.11879  -0.60563  -0.13999  0.0016986
#> Speff     -0.34457  0.10997  -0.56011  -0.12904  0.0017282

speffsurv_impart_demo(
  data = colon_cancer_5fu_vs_obs,
  estimand = "log_hazard_ratio",
  formula = 
    survival::Surv(time = years_to_death, event = event_death) ~
    age + sex + obstruction + perforation + organ_adherence + positive_nodes +
    differentiation + local_spread,
  treatment_column = "arm",
  alpha = 0.05,
  ci = FALSE
)
#> $estimate
#> [1] -0.3445742
#> 
#> $se
#> [1] 0.1099697
#> 
#> $lcl
#> NULL
#> 
#> $ucl
#> NULL
#> 
#> $alpha
#> [1] 0.05
#> 
#> $estimand
#> [1] "log_hazard_ratio"

speffsurv_impart_demo(
  data = colon_cancer_5fu_vs_obs,
  estimand = "log_hazard_ratio",
  formula = 
    survival::Surv(time = years_to_death, event = event_death) ~
    age + sex + obstruction + perforation + organ_adherence + positive_nodes +
    differentiation + local_spread,
  treatment_column = "arm",
  alpha = 0.05,
  ci = TRUE
)
#> $estimate
#> [1] -0.3445742
#> 
#> $se
#> [1] 0.1099697
#> 
#> $lcl
#> [1] -0.5601109
#> 
#> $ucl
#> [1] -0.1290376
#> 
#> $alpha
#> [1] 0.05
#> 
#> $estimand
#> [1] "log_hazard_ratio"

The wrapper should also be tested on missing data, making sure they are addressed successfully:

colon_cancer_5fu_vs_obs_mcar <- colon_cancer_5fu_vs_obs

# Set some categorical/continuous variables to NA
set.seed(seed = 12345)
miss_rows <- sample(x = 1:nrow(colon_cancer_5fu_vs_obs), size = 10)
colon_cancer_5fu_vs_obs_mcar$age[miss_rows] <- NA
colon_cancer_5fu_vs_obs_mcar$sex[miss_rows] <- NA
colon_cancer_5fu_vs_obs_mcar$positive_nodes[miss_rows] <- NA

speffsurv_impart_demo(
  data = colon_cancer_5fu_vs_obs_mcar,
  estimand = "log_hazard_ratio",
  formula = 
    survival::Surv(time = years_to_death, event = event_death) ~
    age + sex + obstruction + perforation + organ_adherence + positive_nodes +
    differentiation + local_spread,
  treatment_column = "arm",
  alpha = 0.05,
  ci = TRUE
)
#> $estimate
#> [1] -0.3425506
#> 
#> $se
#> [1] 0.1103383
#> 
#> $lcl
#> [1] -0.5588097
#> 
#> $ucl
#> [1] -0.1262915
#> 
#> $alpha
#> [1] 0.05
#> 
#> $estimand
#> [1] "log_hazard_ratio"

Testing the Wrapper

All wrappers should be appropriately tested: the testthat package provides a suite of functions that simplify specifying and executing a testing workflow.

A suggested test workflow includes, but is not limited to:

  • Testing for missing covariate data
  • Testing for missing outcome data
    • For time-to-event, checking both event indicators and event times
  • Checking for compatibility with input that contains:
    • numeric, integer, logical, factor, string
  • Checking for labelled data from haven

1. Complete Data

test_that(
  desc = "Complete Data",
  code =
    {
      expect_no_condition(
        object =
          speffsurv_impart_demo(
            data = colon_cancer_5fu_vs_obs,
            estimand = "log_hazard_ratio",
            formula = 
              survival::Surv(time = years_to_death, event = event_death) ~
              age + sex + obstruction + perforation + organ_adherence + positive_nodes +
              differentiation + local_spread,
            treatment_column = "arm",
            alpha = 0.05,
            ci = FALSE
          )
      )
    }
)
#> Test passed 🎊

2. Missing Covariates

test_that(
  desc = "Missing Covariates",
  code =
    {
      set.seed(seed = 12345)
      colon_cancer_5fu_vs_obs_mcar <- colon_cancer_5fu_vs_obs
      miss_rows <- sample(x = 1:nrow(colon_cancer_5fu_vs_obs), size = 10)
      colon_cancer_5fu_vs_obs_mcar$age[miss_rows] <- NA
      colon_cancer_5fu_vs_obs_mcar$sex[miss_rows] <- NA
      colon_cancer_5fu_vs_obs_mcar$positive_nodes[miss_rows] <- NA
      
      expect_no_condition(
        object =
          speffsurv_impart_demo(
            data = colon_cancer_5fu_vs_obs_mcar,
            estimand = "log_hazard_ratio",
            formula = 
              survival::Surv(time = years_to_death, event = event_death) ~
              age + sex + obstruction + perforation + organ_adherence + positive_nodes +
              differentiation + local_spread,
            treatment_column = "arm",
            alpha = 0.05,
            ci = TRUE
          )
      )
    }
)
#> Test passed 🎊

3. Missing Outcome Indicator

test_that(
  desc = "Missing Outcome Indicator",
  code =
    {
      set.seed(seed = 23456)
      colon_cancer_5fu_vs_obs_mcar <- colon_cancer_5fu_vs_obs
      miss_rows <- sample(x = 1:nrow(colon_cancer_5fu_vs_obs), size = 10)
      colon_cancer_5fu_vs_obs_mcar$event_death[miss_rows] <- NA

      expect_error(
        object =
          speffsurv_impart_demo(
            data = colon_cancer_5fu_vs_obs_mcar,
            estimand = "log_hazard_ratio",
            formula = 
              survival::Surv(time = years_to_death, event = event_death) ~
              age + sex + obstruction + perforation + organ_adherence + positive_nodes +
              differentiation + local_spread,
            treatment_column = "arm",
            alpha = 0.05,
            ci = TRUE
          ),
        regexp = "Indicators missing an event time"
      )
    }
)
#> Test passed 🥇

4. Missing Outcome Times

test_that(
  desc = "Missing Outcome Times",
  code =
    {
      set.seed(seed = 23456)
      colon_cancer_5fu_vs_obs_mcar <- colon_cancer_5fu_vs_obs
      miss_rows <- sample(x = 1:nrow(colon_cancer_5fu_vs_obs), size = 10)
      colon_cancer_5fu_vs_obs_mcar$years_to_death[miss_rows] <- NA

      expect_error(
        object =
          speffsurv_impart_demo(
            data = colon_cancer_5fu_vs_obs_mcar,
            estimand = "log_hazard_ratio",
            formula = 
              survival::Surv(time = years_to_death, event = event_death) ~
              age + sex + obstruction + perforation + organ_adherence + positive_nodes +
              differentiation + local_spread,
            treatment_column = "arm",
            alpha = 0.05,
            ci = TRUE
          ),
        regexp = "Indicators missing an event time"
      )
    }
)
#> Test passed 🥇