Getting Started
getting-started.RmdThis vignette walks through a complete rxsim simulation from start to finish: defining a trial scenario, building arm populations, registering an analysis trigger, running replicates, and collecting results. By the end you will have seen the full rxsim workflow in one place and will be ready to adapt it to your own design. For deeper explanations of each building block, see the Core Concepts vignette.
Step 1 — Define the scenario
The scenario tibble captures the design parameters you
want to track in your results — sample size, allocation ratios, or any
other factors you might vary across simulation runs. Wrapping them in
tidyr::expand_grid() produces a one-row data frame that
gets embedded in every analysis result. When you later stack results
across many scenarios or sensitivity analyses, this metadata keeps each
row traceable back to its design.
sample_size <- 30
arms <- c("control", "treatment")
allocation <- c(1, 1)
true_delta <- 0.5
scenario <- tidyr::expand_grid(
sample_size = sample_size,
allocation = list(allocation),
true_delta = true_delta
)Step 2 — Define arm populations
Each trial arm is represented by a generator
function — a plain R function that takes n (the
number of subjects planned for that arm) and returns a data frame of
subject-level endpoint data. rxsim calls these functions once per
replicate to draw a fresh population, so every replicate gets
independent data.
The data frame must contain at least three columns:
-
id— a unique integer identifier per subject -
readout_time— the time (in trial-time units) after enrollment when the endpoint is observed (use0for baseline or immediately-available data) - at least one endpoint column
vector_to_dataframe() is a convenience helper that wraps
a numeric vector into this standard format with id,
data, and readout_time = 0.
Here we simulate a simple continuous endpoint: the control arm is
standard-normal, the treatment arm has a mean shift of
true_delta.
population_generators <- list(
control = function(n) vector_to_dataframe(rnorm(n)),
treatment = function(n) vector_to_dataframe(rnorm(n, mean = true_delta))
)Step 3 — Define enrollment and dropout
Enrollment and dropout are modelled as random processes. Each
function takes n and returns a vector of
inter-event times — the waiting times between
successive enrollments (or dropouts). These times are drawn
independently for every replicate, giving each trial its own enrollment
trajectory.
rexp(n, rate = 1) generates exponentially-distributed
inter-arrival times with a mean gap of 1 time unit between enrolments —
a common approximation for Poisson arrivals. A lower dropout rate
(0.05) reflects a trial where most subjects complete the
study.
Step 4 — Define analysis triggers
Analysis triggers are the heart of rxsim. Each trigger pairs a condition (when to fire) with an analysis function (what to compute when it fires).
The condition is written as a dplyr-style boolean
expression inside rlang::exprs(). It is evaluated against a
snapshot of all currently enrolled subjects at each timepoint. Here the
condition fires once full enrollment is reached, i.e.,
sample_size subjects have accumulated a non-NA
enroll_time.
The !! operator (pronounced “bang-bang”) injects
the current value of sample_size into the
expression at definition time, rather than looking it up at evaluation
time. This is necessary because the expression is stored and evaluated
later inside the simulation loop.
When the condition is met, rxsim calls the analysis function with two arguments:
-
df: a data frame snapshot of all enrolled subjects at the triggering timepoint, with columns from the population data plusenroll_time,drop_time, andarm -
time: the current trial clock time at which the trigger fired
The function should return a data frame (one row per trigger event is the conventional pattern).
analysis_generators <- list(
final = list(
trigger = rlang::exprs(
sum(!is.na(enroll_time)) >= !!sample_size
),
analysis = function(df, timer) {
enrolled <- subset(df, !is.na(enroll_time))
data.frame(
scenario,
n_enrolled = nrow(enrolled),
mean_ctrl = mean(enrolled$data[enrolled$arm == "control"]),
mean_trt = mean(enrolled$data[enrolled$arm == "treatment"]),
stringsAsFactors = FALSE
)
}
)
)Step 5 — Create replicates and run
replicate_trial() generates n fully
independent Trial objects. For each replicate it:
- Samples a fresh enrollment/dropout timeline from your functions
- Calls each population generator to draw new subject-level data
- Builds
Conditionobjects from youranalysis_generatorsand attaches them to each trial
run_trials() then executes every replicate’s simulation
loop in sequence.
set.seed(42)
trials <- replicate_trial(
trial_name = "getting_started",
sample_size = sample_size,
arms = arms,
allocation = allocation,
enrollment = enrollment,
dropout = dropout,
analysis_generators = analysis_generators,
population_generators = population_generators,
n = 5
)
run_trials(trials)
#> [[1]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: getting_started_1
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[2]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: getting_started_2
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[3]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: getting_started_3
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[4]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: getting_started_4
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[5]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: getting_started_5
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6Step 6 — Collect and interpret results
Analysis results
After running, each Trial object exposes a
results list. It is indexed first by the
timepoint at which an analysis fired (e.g.,
"time_30.4"), then by the analysis name
you gave it (here "final"). The value is whatever your
analysis function returned.
The collect_results() helper gathers every analysis from
every timepoint across all replicates into one tidy data frame:
replicate_results <- collect_results(trials)
replicate_results
#> replicate timepoint analysis sample_size allocation true_delta n_enrolled
#> 1 1 29.71727 final 30 1, 1 0.5 30
#> 2 2 27.31408 final 30 1, 1 0.5 30
#> 3 3 35.29848 final 30 1, 1 0.5 30
#> 4 4 23.61616 final 30 1, 1 0.5 30
#> 5 5 25.83584 final 30 1, 1 0.5 30
#> mean_ctrl mean_trt
#> 1 -0.1657292 0.61748974
#> 2 -0.0978443 0.64303730
#> 3 -0.2573593 0.07050019
#> 4 -0.4199802 0.25120329
#> 5 -0.4170091 0.59879107Each row is one replicate. The mean_ctrl and
mean_trt columns show the arm-level sample means at the
moment the final analysis fired. The timepoint and
analysis columns identify when and which analysis produced
each row — essential when a trial has both interim and final analyses.
Variation across replicates reflects both the stochastic endpoint data
and the randomness in enrollment timing; exactly what operating
characteristic simulations are designed to characterise.
Locked data snapshots
Every time an analysis fires, rxsim also saves a locked data snapshot, the full subject-level data frame at that timepoint. You can inspect it directly to debug your analysis function, audit which subjects were enrolled, or compute additional statistics post-hoc.
# One snapshot per timepoint that fired in replicate 1
names(trials[[1]]$locked_data)
#> [1] "time_29.7172711929598"
# First six rows of the snapshot
head(trials[[1]]$locked_data[[1]])
#> id data readout_time arm enroll_time drop_time subject_id
#> 1 1 -2.6882473 0 control 0.8592321 NA 1
#> 2 2 0.8666502 0 control 11.6939492 NA 2
#> 3 3 0.1687397 0 control 26.6106187 NA 3
#> 4 4 -1.0908241 0 control 1.6540916 NA 4
#> 5 5 -0.3803481 0 control 9.5016932 NA 5
#> 6 6 -0.9480918 0 control 13.9319792 NA 6
#> measurement_time time
#> 1 0.8592321 29.71727
#> 2 11.6939492 29.71727
#> 3 26.6106187 29.71727
#> 4 1.6540916 29.71727
#> 5 9.5016932 29.71727
#> 6 13.9319792 29.71727The locked data contains the population columns (id,
data, arm, readout_time) plus
three columns added by rxsim:
| Column | Meaning |
|---|---|
enroll_time |
Calendar time the subject was enrolled (NA if not yet
enrolled) |
drop_time |
Calendar time the subject dropped out (NA if still
active) |
time |
The trial clock time at which this snapshot was taken |
Next steps
Now that you have seen the full workflow, here is where to go next:
-
Core Concepts —
understand how
Population,Timer,Condition, andTrialcompose, and how to write more advanced trigger expressions -
Enrollment & Dropout
Modeling — choose between stochastic
(
gen_plan) and piecewise-constant (gen_timepoints) schedules - Example 1 through Example 7 — progressively complex designs: correlated endpoints, time-to-event, multi-arm dose-finding, subgroup analyses, and Bayesian Go/No-Go rules