Implementing New Methods in `impart`
Source:vignettes/new_methods_in_impart.Rmd
new_methods_in_impart.Rmd
impart
has several built-in methods for covariate
adjustment, but is designed to work with methods from other R packages.
This vignette will demonstrate how to integrate other software by
writing a wrapper
function, and test the resulting function to make sure it works with
impart
.
Notes for Developers
impart
was written to minimize dependencies to avoid the
potential problems that may arise when dependencies are deprecated or
updated. This includes dependence on common, relatively mature packages
such as the tidyverse
ecosystem.
While some code works identically for tibble
s and base R
data.frame
s, these objects behave differently under vector subsetting -
my_tbl_df[, 1]
or my_tbl_df[, 1:2]
return a
tbl_df
whereas my_df[, 1]
returns a vector and
my_df[, 1:2]
returns a data.frame
. Results of
dim
, length
, nrow
,
ncol
, and other functions may differ between a
tbl_df
and a data.frame
: if this is not
handled appropriately, this can result in explicit errors (resulting in
a stop
) or silently returning erroneous results. The
connection between error messages and the cause of the error is often
unclear without looking at source code.
For this reason, impart::monitored_analysis
checks to
ensure that data are passed as a data.frame
: a
tbl_df
can be easily coerced to a data.frame
using as.data.frame()
. If some function depends on having a
tbl_df
and is unable to accept a data.frame
,
the wrapper should include a step to coerce the input back to a
tbl_df
before being passed to the function for
analysis.
Implementing speff2trial::speffSurv()
in
impart
The speff2trial::speffSurv()
in the speff2trial
package on CRAN computes a covariate-adjusted marginal hazard
ratio. The repo
for this package is hosted on Github. This vignette shows how to
create a wrapper for this function, which is already integrated in the
package as impart::speffsurv_impart()
.
The impart::colon_cancer
dataset will be used to test
the wrapper, using the outcome of years_to_death
: a
time-to-event outcome that is subject to censoring.
library(impart)
# Extract two treatment arms: Lev+5FU (Chemotherapy) and Obs (Observation)
colon_cancer_5fu_vs_obs <-
subset(
x = colon_cancer,
arm %in% c("Lev+5FU", "Obs")
) |>
droplevels()
head(colon_cancer_5fu_vs_obs)
#> .id arm age sex obstruction perforation organ_adherence
#> 1 1 Lev+5FU 43 1. Male 0. No 0. No 0. No
#> 2 2 Lev+5FU 63 1. Male 0. No 0. No 0. No
#> 3 3 Obs 71 0. Female 0. No 0. No 1. Yes
#> 4 4 Lev+5FU 66 0. Female 1. Yes 0. No 0. No
#> 5 5 Obs 69 1. Male 0. No 0. No 0. No
#> 6 6 Lev+5FU 57 0. Female 0. No 0. No 0. No
#> positive_nodes differentiation local_spread time_surgery_registration
#> 1 5 2. Moderate 3. Serosa 0. Short
#> 2 1 2. Moderate 3. Serosa 0. Short
#> 3 7 2. Moderate 2. Muscle 0. Short
#> 4 6 2. Moderate 3. Serosa 1. Long
#> 5 22 2. Moderate 3. Serosa 1. Long
#> 6 9 2. Moderate 3. Serosa 0. Short
#> event_death event_recurrence years_to_death years_to_recurrence
#> 1 1 1 4.1642710 2.6502396
#> 2 0 0 8.4517454 8.4517454
#> 3 1 1 2.6365503 1.4839151
#> 4 1 1 0.8021903 0.6707734
#> 5 1 1 1.8042437 1.4318960
#> 6 1 1 4.8377823 2.4750171
The wrapper function should take in all of the necessary arguments to
address missing data, compute the estimate, and return the result. The
result should either be a numeric
vector named
estimate
, or a list
containing a numeric
vector named estimate
. The value of estimate
is extracted by impart::calculate_estimate()
to obtain
estimates and bootstrap confidence intervals.
First, run the function that requires implementation: see the
documentation of the function
(e.g. ?speff2trial::speffSurv
) for details:
library(speff2trial)
#> Loading required package: leaps
#> Loading required package: survival
speffsurv_example <-
speff2trial::speffSurv(
formula =
survival::Surv(time = years_to_death, event = event_death) ~
age + sex + obstruction + perforation + organ_adherence + positive_nodes +
differentiation +local_spread,
data = colon_cancer_5fu_vs_obs,
trt.id = "arm",
fixed = TRUE
)
The summary
or print
methods can be helpful
in identifying the appropriate value to extract:
summary(speffsurv_example)
#>
#> Treatment effect
#> Log HR SE LowerCI UpperCI p
#> Prop Haz -0.37281 0.11879 -0.60563 -0.13999 0.0016986
#> Speff -0.34457 0.10997 -0.56011 -0.12904 0.0017282
Next, inspect the output:
names(speffsurv_example)
#> [1] "beta" "varbeta" "fixed" "conf.level" "method"
#> [6] "n"
str(speffsurv_example)
#> List of 6
#> $ beta : Named num [1:2] -0.373 -0.345
#> ..- attr(*, "names")= chr [1:2] "Prop Haz" "Speff"
#> $ varbeta : Named num [1:2] 0.0141 0.0121
#> ..- attr(*, "names")= chr [1:2] "Prop Haz" "Speff"
#> $ fixed : logi TRUE
#> $ conf.level: num 0.95
#> $ method : chr "exhaustive"
#> $ n : 'table' int [1:2(1d)] 315 304
#> ..- attr(*, "dimnames")=List of 1
#> .. ..$ : chr [1:2] "Obs" "Lev+5FU"
#> - attr(*, "class")= chr "speffSurv"
This function returns a list
, and the marginal hazard
ratio estimates are in a named numeric vector called
beta
.
An Example Wrapper
A wrapper is needed to take in the appropriate arguments, compute the
estimate, and then return a vector named estimate
or a list
containing that vector.
speffsurv_impart_demo <-
function(
data,
estimand = "log_hazard_ratio",
formula,
treatment_column = NULL,
alpha = 0.05,
ci = FALSE
){
if(!(estimand %in% c("log_hazard_ratio"))){
stop("`estimand` must be \"log_hazard_ratio\".")
}
if(!(treatment_column %in% names(data))){
stop("`treatment_column` (", treatment_column, ") must be in `data`.")
}
# Check for event indicators that are missing times, vice versa
outcome_cols <-
all.vars(update(old = formula, new = . ~ 0))
if(length(outcome_cols == 2)){
miss_rows <- which(rowSums(is.na(data[, outcome_cols])) == 1)
if(length(miss_rows) > 0){
stop("Indicators missing an event time or event times missing an",
"outcome indicator: rows ", paste0(miss_rows, collapse = ", "))
}
}
# Get baseline covariates from formula
baseline_covariates <-
all.vars(update(old = formula, new = 0 ~ .))
# Impute any missing values using mean/mode imputation
data <-
impute_covariates_mean_mode(
data = data,
baseline_covariates = baseline_covariates
)
# Subset to individuals whose outcomes have been assessed:
speffsurv_result <-
speff2trial::speffSurv(
formula = formula,
data = data,
trt.id = treatment_column,
conf.level = 1 - alpha,
fixed = TRUE
)
# Return CI if specified
if(ci){
speffsurv_summary <-
data.frame(summary(speffsurv_result)$tab)
lcl <- speffsurv_summary["Speff", "LowerCI"]
ucl <- speffsurv_summary["Speff", "UpperCI"]
} else {
lcl = NULL
ucl = NULL
}
return(
list(
estimate = as.numeric(speffsurv_result$beta["Speff"]),
se = sqrt(as.numeric(speffsurv_result$varbeta["Speff"])),
lcl = lcl,
ucl = ucl,
alpha = alpha,
estimand = estimand
)
)
}
Compare the results of the wrapper to the summary:
summary(speffsurv_example)
#>
#> Treatment effect
#> Log HR SE LowerCI UpperCI p
#> Prop Haz -0.37281 0.11879 -0.60563 -0.13999 0.0016986
#> Speff -0.34457 0.10997 -0.56011 -0.12904 0.0017282
speffsurv_impart_demo(
data = colon_cancer_5fu_vs_obs,
estimand = "log_hazard_ratio",
formula =
survival::Surv(time = years_to_death, event = event_death) ~
age + sex + obstruction + perforation + organ_adherence + positive_nodes +
differentiation + local_spread,
treatment_column = "arm",
alpha = 0.05,
ci = FALSE
)
#> $estimate
#> [1] -0.3445742
#>
#> $se
#> [1] 0.1099697
#>
#> $lcl
#> NULL
#>
#> $ucl
#> NULL
#>
#> $alpha
#> [1] 0.05
#>
#> $estimand
#> [1] "log_hazard_ratio"
speffsurv_impart_demo(
data = colon_cancer_5fu_vs_obs,
estimand = "log_hazard_ratio",
formula =
survival::Surv(time = years_to_death, event = event_death) ~
age + sex + obstruction + perforation + organ_adherence + positive_nodes +
differentiation + local_spread,
treatment_column = "arm",
alpha = 0.05,
ci = TRUE
)
#> $estimate
#> [1] -0.3445742
#>
#> $se
#> [1] 0.1099697
#>
#> $lcl
#> [1] -0.5601109
#>
#> $ucl
#> [1] -0.1290376
#>
#> $alpha
#> [1] 0.05
#>
#> $estimand
#> [1] "log_hazard_ratio"
The wrapper should also be tested on missing data, making sure they are addressed successfully:
colon_cancer_5fu_vs_obs_mcar <- colon_cancer_5fu_vs_obs
# Set some categorical/continuous variables to NA
set.seed(seed = 12345)
miss_rows <- sample(x = 1:nrow(colon_cancer_5fu_vs_obs), size = 10)
colon_cancer_5fu_vs_obs_mcar$age[miss_rows] <- NA
colon_cancer_5fu_vs_obs_mcar$sex[miss_rows] <- NA
colon_cancer_5fu_vs_obs_mcar$positive_nodes[miss_rows] <- NA
speffsurv_impart_demo(
data = colon_cancer_5fu_vs_obs_mcar,
estimand = "log_hazard_ratio",
formula =
survival::Surv(time = years_to_death, event = event_death) ~
age + sex + obstruction + perforation + organ_adherence + positive_nodes +
differentiation + local_spread,
treatment_column = "arm",
alpha = 0.05,
ci = TRUE
)
#> $estimate
#> [1] -0.3425506
#>
#> $se
#> [1] 0.1103383
#>
#> $lcl
#> [1] -0.5588097
#>
#> $ucl
#> [1] -0.1262915
#>
#> $alpha
#> [1] 0.05
#>
#> $estimand
#> [1] "log_hazard_ratio"
Testing the Wrapper
All wrappers should be appropriately tested: the testthat package provides a suite of functions that simplify specifying and executing a testing workflow.
A suggested test workflow includes, but is not limited to:
- Testing for missing covariate data
- Testing for missing outcome data
- For time-to-event, checking both event indicators and event times
- Checking for compatibility with input that contains:
-
numeric
,integer
,logical
,factor
,string
-
- Checking for
labelled
data fromhaven
1. Complete Data
test_that(
desc = "Complete Data",
code =
{
expect_no_condition(
object =
speffsurv_impart_demo(
data = colon_cancer_5fu_vs_obs,
estimand = "log_hazard_ratio",
formula =
survival::Surv(time = years_to_death, event = event_death) ~
age + sex + obstruction + perforation + organ_adherence + positive_nodes +
differentiation + local_spread,
treatment_column = "arm",
alpha = 0.05,
ci = FALSE
)
)
}
)
#> Test passed 🎊
2. Missing Covariates
test_that(
desc = "Missing Covariates",
code =
{
set.seed(seed = 12345)
colon_cancer_5fu_vs_obs_mcar <- colon_cancer_5fu_vs_obs
miss_rows <- sample(x = 1:nrow(colon_cancer_5fu_vs_obs), size = 10)
colon_cancer_5fu_vs_obs_mcar$age[miss_rows] <- NA
colon_cancer_5fu_vs_obs_mcar$sex[miss_rows] <- NA
colon_cancer_5fu_vs_obs_mcar$positive_nodes[miss_rows] <- NA
expect_no_condition(
object =
speffsurv_impart_demo(
data = colon_cancer_5fu_vs_obs_mcar,
estimand = "log_hazard_ratio",
formula =
survival::Surv(time = years_to_death, event = event_death) ~
age + sex + obstruction + perforation + organ_adherence + positive_nodes +
differentiation + local_spread,
treatment_column = "arm",
alpha = 0.05,
ci = TRUE
)
)
}
)
#> Test passed 🎊
3. Missing Outcome Indicator
test_that(
desc = "Missing Outcome Indicator",
code =
{
set.seed(seed = 23456)
colon_cancer_5fu_vs_obs_mcar <- colon_cancer_5fu_vs_obs
miss_rows <- sample(x = 1:nrow(colon_cancer_5fu_vs_obs), size = 10)
colon_cancer_5fu_vs_obs_mcar$event_death[miss_rows] <- NA
expect_error(
object =
speffsurv_impart_demo(
data = colon_cancer_5fu_vs_obs_mcar,
estimand = "log_hazard_ratio",
formula =
survival::Surv(time = years_to_death, event = event_death) ~
age + sex + obstruction + perforation + organ_adherence + positive_nodes +
differentiation + local_spread,
treatment_column = "arm",
alpha = 0.05,
ci = TRUE
),
regexp = "Indicators missing an event time"
)
}
)
#> Test passed 🥇
4. Missing Outcome Times
test_that(
desc = "Missing Outcome Times",
code =
{
set.seed(seed = 23456)
colon_cancer_5fu_vs_obs_mcar <- colon_cancer_5fu_vs_obs
miss_rows <- sample(x = 1:nrow(colon_cancer_5fu_vs_obs), size = 10)
colon_cancer_5fu_vs_obs_mcar$years_to_death[miss_rows] <- NA
expect_error(
object =
speffsurv_impart_demo(
data = colon_cancer_5fu_vs_obs_mcar,
estimand = "log_hazard_ratio",
formula =
survival::Surv(time = years_to_death, event = event_death) ~
age + sex + obstruction + perforation + organ_adherence + positive_nodes +
differentiation + local_spread,
treatment_column = "arm",
alpha = 0.05,
ci = TRUE
),
regexp = "Indicators missing an event time"
)
}
)
#> Test passed 🥇