Example 4: Two arm | Fixed design | Correlated continuous & time-to-event endpoints | t-test + survival analysis
example-4.Rmd
# Core package that provides Timer, Population, Trial, gen_timepoints, add_timepoints, ...
library(rxsim)
# Utilities
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(MASS) # for mvrnorm (simulate correlated latent variables)
#>
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#>
#> select
library(survival) # for Surv, survdiff, coxphOncology and cardiovascular trials often co-primary a continuous biomarker alongside a time-to-event (TTE) endpoint such as progression-free survival or overall survival. Correlation between a biomarker and TTE arises naturally through shared biological mechanisms. This example demonstrates simulating such a joint endpoint structure and applying both a t-test (for the continuous component) and a log-rank test plus Cox proportional hazards model (for TTE) at a single final analysis. See Example 3 for the analogous two-correlated-continuous setup.
Scenario
We consider a two-arm fixed design with a continuous endpoint (Y) and a time-to-event (TTE) endpoint per subject. The endpoints are correlated via a shared Gaussian latent structure. Enrollment is piece-wise linear; analysis happens once at the final time.
# Total target N (across both arms)
sample_size <- 1200
# Arms and allocation (balanced)
arms <- c("pbo", "trt")
allocation <- c(1, 1)
enrollment_fn <- function(n) rexp(n, rate = 1)
dropout_fn <- function(n) rexp(n, rate = 0.01)
# Correlation between continuous endpoint driver (Z1) and frailty for TTE (Z2)
rho <- 0.50
# Continuous endpoint parameters
mu_pbo <- 0.0
delta <- 0.3 # true treatment - placebo mean difference
sd_y <- 1.0
# TTE model parameters (Exponential PH with subject frailty exp(eta))
lambda0 <- 0.08 # baseline hazard (per time unit)
HR_trt <- 0.70 # treatment hazard ratio (<1 favors treatment)
# Frailty scale (eta = sigma_frailty * Z2)
sigma_frailty <- 0.6
# Covariance of (Z1, Z2)
SigmaZ <- matrix(c(1, rho, rho, 1), nrow = 2)
scenario <- tidyr::expand_grid(
sample_size = sample_size,
allocation = list(allocation),
delta = delta,
rho = rho,
hr_trt_true = HR_trt
)HR_trt = 0.70 encodes a 30% hazard reduction in the
treatment arm — a clinically meaningful effect in many oncology
settings. lambda0 = 0.08 is the baseline event rate per
time unit; combined with n=1200, this ensures a substantial number of
TTE events even with only 3 replicates. sigma_frailty = 0.6
controls the strength of the subject-level frailty term: larger values
introduce more between-subject heterogeneity in TTE, making the frailty
the shared biological driver linking the continuous and TTE endpoints
through the bivariate latent (Z1, Z2) with rho = 0.50.
Time points
Generate the discrete timepoints and add them to a
Timer.
plan <- gen_plan(
sample_size = sample_size,
arms = arms,
allocation = allocation,
enrollment = enrollment_fn,
dropout = dropout_fn
)gen_plan() is called with enrollment_fn as
a stochastic function, so each invocation by
replicate_trial() generates a fresh per-replicate
enrollment schedule. This differs from the piecewise-constant list
approach in Example 3: here there is no
deterministic timer pre-built — enrollment timing varies across
replicates, which is the more realistic simulation regime for large
trials.
Arms (Populations)
Per arm we simulate subject-level latent bivariate normals
(Z1, Z2) with correlation rho. - The
continuous endpoint is
Y = mu_arm + sd_y * Z1. - The TTE endpoint
uses a proportional-hazards exponential model with subject-specific
frailty: hazard_i = lambda_arm * exp(eta_i), where
eta_i = sigma_frailty * Z2 and
lambda_trt = lambda0 * HR_trt. Event time
T_i ~ Exponential(rate = hazard_i).
# Create generator function for arm-specific data
mk_population_generator <- function(mu_y, lambda_arm) {
function(n) {
Z <- MASS::mvrnorm(n, mu = c(0, 0), Sigma = SigmaZ)
Z1 <- Z[, 1]
Z2 <- Z[, 2]
y <- mu_y + sd_y * Z1
eta <- sigma_frailty * Z2
rate_i <- lambda_arm * exp(eta)
tte_true <- rexp(n, rate = rate_i)
data.frame(
id = seq_len(n),
y = y,
tte_true = tte_true,
readout_time = 1
)
}
}
lambda_pbo <- lambda0
lambda_trt <- lambda0 * HR_trt
population_generators <- list(
pbo = mk_population_generator(mu_y = mu_pbo, lambda_arm = lambda_pbo),
trt = mk_population_generator(mu_y = mu_pbo + delta, lambda_arm = lambda_trt)
)Each subject gets a latent pair (Z1, Z2) drawn from a bivariate
standard normal with correlation rho. Z1 drives the
continuous endpoint: Y = mu_arm + sd_y * Z1. Z2 drives TTE
through a multiplicative frailty: eta = sigma_frailty * Z2,
so hazard_i = lambda_arm * exp(eta_i), and
T_i ~ Exponential(hazard_i). The shared latent structure
means subjects with higher Z2 (higher frailty, shorter TTE) also tend to
have higher Z1 (higher Y), modelling a biological scenario where the
biomarker and TTE share a common underlying driver.
Trigger & Analysis
This is a fixed design: analyze once at the final timepoint. We run a two-sample t-test for the continuous endpoint and a log-rank test plus Cox PH for the TTE endpoint.
analysis_generators <- list(
final = list(
trigger = rlang::exprs(
sum(!is.na(enroll_time)) >= !!sample_size
),
analysis = function(df, timer) {
df_e <- df[!is.na(df$enroll_time), , drop = FALSE]
if (nrow(df_e) == 0) {
return(data.frame(scenario, note = "no enrolled subjects", stringsAsFactors = FALSE))
}
# Build observed TTE with censoring
censor_drop <- ifelse(is.na(df_e$drop_time), Inf, pmax(0, df_e$drop_time - df_e$enroll_time))
censor_admin <- pmax(0, max(df_e$enroll_time) + 100 - df_e$enroll_time)
tte_true <- df_e$tte_true
t_obs <- pmin(tte_true, censor_drop, censor_admin)
status <- as.integer(tte_true <= pmin(censor_drop, censor_admin))
# Continuous endpoint t-test
p_t <- tryCatch(
t.test(y ~ arm, data = df_e)$p.value,
error = function(e) NA_real_
)
# Survival analysis: log-rank and Cox
surv_p <- NA_real_
hr <- hr_lo <- hr_hi <- NA_real_
events <- sum(status)
if (length(unique(df_e$arm)) == 2 && events > 0 && all(t_obs >= 0)) {
S <- Surv(time = t_obs, event = status)
surv_p <- tryCatch({
sd <- survdiff(S ~ arm, data = df_e)
1 - pchisq(sd$chisq, df = length(sd$n) - 1)
}, error = function(e) NA_real_)
cox <- tryCatch(coxph(S ~ arm, data = df_e), error = function(e) NULL)
if (!is.null(cox)) {
s <- summary(cox)
hr <- unname(s$coef[1, "exp(coef)"])
hr_lo <- unname(s$conf.int[1, "lower .95"])
hr_hi <- unname(s$conf.int[1, "upper .95"])
}
}
data.frame(
scenario,
n_total = nrow(df_e),
n_pbo = sum(df_e$arm == "pbo"),
n_trt = sum(df_e$arm == "trt"),
p_ttest_y = p_t,
logrank_p = surv_p,
hr_cox = hr,
hr_ci_lo = hr_lo,
hr_ci_hi = hr_hi,
events = events,
stringsAsFactors = FALSE
)
}
)
)censor_drop censors a subject at the time elapsed
between enrollment and dropout (or infinity if no dropout).
censor_admin provides administrative censoring: each
subject is followed for 100 time units after the last enrollment, a
common end-of-study rule. t_obs and status
capture what would actually be observed. The Surv() object
is then passed to survdiff() for the log-rank p-value and
coxph() for the Cox HR and 95% CI. tryCatch
guards against edge cases (e.g., one arm with no events) so the
simulation continues rather than stopping on an error.
Trial
Create and run multiple trial replicates.
trials <- replicate_trial(
trial_name = "two_endpoints_continuous_tte_fixed",
sample_size = sample_size,
arms = arms,
allocation = allocation,
enrollment = enrollment_fn,
dropout = dropout_fn,
analysis_generators = analysis_generators,
population_generators = population_generators,
n = 3
)Simulate
run_trials(trials)
#> [[1]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: two_endpoints_continuous_tte_fixed_1
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[2]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: two_endpoints_continuous_tte_fixed_2
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[3]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: two_endpoints_continuous_tte_fixed_3
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6Results
One row per replicate, row-bound across all replicates:
replicate_results <- collect_results(trials)
replicate_results
#> replicate timepoint analysis sample_size allocation delta rho hr_trt_true
#> 1 1 1141.764 final 1200 1, 1 0.3 0.5 0.7
#> 2 2 1199.569 final 1200 1, 1 0.3 0.5 0.7
#> 3 3 1214.035 final 1200 1, 1 0.3 0.5 0.7
#> n_total n_pbo n_trt p_ttest_y logrank_p hr_cox hr_ci_lo hr_ci_hi
#> 1 1200 600 600 1.437333e-07 1.352989e-05 0.7768063 0.6930741 0.8706544
#> 2 1200 600 600 9.000125e-08 3.702888e-10 0.6950138 0.6198860 0.7792467
#> 3 1200 600 600 9.572157e-06 3.198372e-07 0.7430600 0.6628171 0.8330175
#> events
#> 1 1198
#> 2 1200
#> 3 1199collect_results() prepends replicate,
timepoint, and analysis columns to the result
data. The events column shows how many TTE events were
observed — with n=1200 and lambda0 = 0.08, this should be
large. logrank_p tests the null of equal survival curves;
hr_cox is the estimated hazard ratio from the Cox model
(values < 1 favour treatment), and
hr_ci_lo/hr_ci_hi are its 95% CI bounds.
Because the continuous endpoint Y and TTE share the same latent drivers,
replicates with a small p_ttest_y tend to also have a small
logrank_p, though the correspondence is imperfect due to
the different statistical tests and the censoring mechanism.