Core Concepts
concepts.RmdOverview
rxsim organises a clinical trial simulation around four collaborating
objects. A Population owns the subject-level data and
tracks each subject’s enrollment and dropout times. A Timer
drives the trial clock: it stores discrete timepoints per arm and
defines when the simulation clock advances. A Condition
pairs a filter expression with an optional analysis function and manages
its own trigger state — it fires when the snapshot data meets a
criterion. A Trial orchestrates the simulation by iterating
over timepoints, updating populations, snapshotting the enrolled cohort,
and collecting results.
graph LR
P1(Control) --> TR(Trial)
P2(Treatment) --> TR
TI(Timer) --> TR
CO(Condition) --> TR
TR --> LD(Locked Data)
TR --> RS(Results)
#> <div class="mermaid">
#> graph LR
#> P1(Control) --> TR(Trial)
#> P2(Treatment) --> TR
#> TI(Timer) --> TR
#> CO(Condition) --> TR
#> TR --> LD(Locked Data)
#> TR --> RS(Results)
#> </div>
The run() loop that Trial executes at each
timepoint follows this sequence:
graph TD
A([Next timepoint]) --> B[Enroll and drop subjects]
B --> C[Snapshot enrolled subjects]
C --> D{Condition triggered?}
D -- Yes --> E[Run analysis]
D -- No --> F[Advance clock]
E --> F
F --> G{More timepoints?}
G -- Yes --> A
G -- No --> H([Done])
#> <div class="mermaid">
#> graph TD
#> A([Next timepoint]) --> B[Enroll and drop subjects]
#> B --> C[Snapshot enrolled subjects]
#> C --> D{Condition triggered?}
#> D -- Yes --> E[Run analysis]
#> D -- No --> F[Advance clock]
#> E --> F
#> F --> G{More timepoints?}
#> G -- Yes --> A
#> G -- No --> H([Done])
#> </div>
In most workflows you will never construct these objects by hand.
Instead you use the high-level entry point
replicate_trial() + run_trials(), which build
and execute n independent Trial objects from
your generator functions. Understanding the three classes directly is
useful when you want to:
- inspect the locked snapshot mid-simulation for debugging
- write custom multi-timepoint designs that
gen_plan()cannot express - use
Trial$new()directly for a one-off single-run simulation (as in Example 8)
The rest of this vignette unpacks each building block in turn.
Population
A Population object represents a single arm’s worth of
subjects. It stores the endpoint data alongside two tracking vectors
(enrolled, dropped) that record the calendar
time each subject was enrolled or dropped — or NA if
neither event has occurred yet.
Data structure
The data field must be a data frame with at least four
columns:
| Column | Type | Meaning |
|---|---|---|
id |
integer | Unique subject identifier within the arm |
arm |
character | Arm label (auto-filled from name if absent) |
readout_time |
numeric | Lag from enrollment to endpoint observation |
| (endpoint) | numeric | At least one endpoint column (e.g., data,
y, tte) |
The readout_time column is not a calendar time — it is
the lag between a subject’s enrollment date and the date their endpoint
is observed. Use 0 for immediately available data (e.g., a
baseline measure) and a positive value for delayed readouts (e.g.,
readout_time = 12 means the endpoint is read 12 weeks after
enrollment).
vector_to_dataframe()
For the common case of a single continuous endpoint,
vector_to_dataframe() wraps a numeric vector into the
required format with columns id, data, and
readout_time = 0:
ep <- rnorm(6, mean = 1, sd = 0.5)
df <- vector_to_dataframe(ep)
df
#> id data readout_time
#> 1 1 0.2999782 0
#> 2 2 1.1276585 0
#> 3 3 -0.2186318 0
#> 4 4 0.9972144 0
#> 5 5 1.3107764 0
#> 6 6 1.5742058 0You can also build the data frame yourself — useful when you have covariates, time-varying endpoints, or different readout lags per subject.
Repeated measurements and n_readouts
When a subject contributes multiple measurements (a longitudinal or
pharmacokinetic design), each measurement appears as its own row with a
distinct readout_time. Population detects this
automatically and stores the number of rows per subject in
n_readouts:
# Two timepoints per subject: baseline (0) and week 12 (12)
long_df <- data.frame(
id = rep(1:4, each = 2),
readout_time = rep(c(0, 12), times = 4),
response = rnorm(8)
)
pop_long <- Population$new(name = "treatment", data = long_df)
pop_long$n # 4 unique subjects
#> [1] 4
pop_long$n_readouts # 2 rows per subject
#> [1] 2Enrollment and dropout vectors
enrolled and dropped are numeric vectors of
length n, initialised to NA for every subject.
NA means “not yet acted on”: the subject exists in the pool
but has not been enrolled or dropped. The Trial assigns
calendar times to these vectors as the simulation clock advances.
set_enrolled(n, time) marks n randomly
chosen currently unenrolled subjects as enrolled at time.
set_dropped(n, time) similarly marks n
randomly chosen enrolled-and-not-yet-dropped subjects as dropped at
time.
set.seed(1)
pop <- Population$new(name = "control", data = vector_to_dataframe(rnorm(8)))
pop$enrolled # all NA to start
#> [1] NA NA NA NA NA NA NA NA
pop$set_enrolled(5, time = 2)
pop$enrolled # 5 subjects enrolled at t=2, 3 still NA
#> [1] 2 2 2 NA NA 2 NA 2
pop$set_dropped(2, time = 7)
pop$dropped # 2 of the enrolled subjects dropped at t=7
#> [1] 7 NA NA NA NA NA NA 7The Trial’s run() method calls these
setters automatically based on the Timer’s schedule — you
rarely need to call them directly.
Timer
A Timer drives the trial clock. It holds a
timelist that specifies how many subjects to enroll or drop
in each arm at each time unit. At each unique time in the
timelist, Trial$run() processes enrollment and
dropout events, then evaluates analysis triggers.
Analysis triggers are now managed by the separate
[Condition] class (see Conditions
below). Condition objects live in
trial$conditions, not inside Timer.
Timepoints
Each entry in the timelist has four fields:
| Field | Type | Meaning |
|---|---|---|
time |
numeric | Calendar time of the event |
arm |
character | Which arm the event applies to |
enroller |
integer | Number of subjects to enroll at this time |
dropper |
integer | Number of subjects to drop at this time |
Timepoints are per-arm because the two arms in a trial may enroll subjects on different schedules, and enrollment events in one arm do not affect the other.
t <- Timer$new(name = "my_timer")
t$add_timepoint(time = 1, arm = "control", enroller = 5L, dropper = 0L)
t$add_timepoint(time = 1, arm = "treatment", enroller = 5L, dropper = 0L)
t$add_timepoint(time = 2, arm = "control", enroller = 3L, dropper = 1L)
t$add_timepoint(time = 2, arm = "treatment", enroller = 4L, dropper = 0L)
t$add_timepoint(time = 10, arm = "control", enroller = 0L, dropper = 0L)
t$add_timepoint(time = 10, arm = "treatment", enroller = 0L, dropper = 0L)
t$get_unique_times() # c(1, 2, 10)
#> [1] 1 2 10
t$get_end_timepoint() # 10
#> [1] 10
t$get_n_arms() # 2
#> [1] 2Rather than adding timepoints one by one,
add_timepoints() accepts a data frame with the four columns
above — exactly what gen_plan() and
gen_timepoints() return. See Enrollment & Dropout Modeling for
details.
Conditions
A Condition pairs a filter expression with an optional
analysis function and manages its own trigger state. It is a separate R6
object that lives in trial$conditions.
When Trial$run() reaches each timepoint, it iterates
over every Condition in trial$conditions and
calls cond$check_conditions(snapshot, current_time). Each
condition applies a three-gate check:
-
Filter gate —
dplyr::filter()is applied to the snapshot using the condition’swherequosures. If the result is empty (no rows match), the condition does not fire. -
Max-triggers gate — if
trigger_count >= max_triggers, the condition does not fire again. -
Cooldown gate — if
current_time - last_trigger_time < cooldown, the condition does not fire yet.
If all three gates pass, the analysis function is called on the
filtered data and the result is stored. Trigger state
(trigger_count, last_trigger_time) is updated
automatically.
Construct a Condition with rlang::quos() to
capture the filter predicates:
# A toy snapshot data frame
snapshot <- data.frame(
id = 1:8,
arm = rep(c("control", "treatment"), 4),
enroll_time = c(1, 1, 2, 2, NA, NA, NA, NA),
data = rnorm(8)
)
# Fire when at least 4 subjects are enrolled
cond_interim <- Condition$new(
where = rlang::quos(sum(!is.na(enroll_time)) >= 4),
analysis = function(df, current_time) {
data.frame(n_enrolled = sum(!is.na(df$enroll_time)),
fired_at = current_time)
},
name = "interim"
)
res <- cond_interim$check_conditions(locked_data = snapshot, current_time = 5)
res[["interim"]]
#> n_enrolled fired_at
#> 1 4 5If no analysis function is provided, check_conditions()
returns the filtered subset as-is with a warning — convenient for
inspection or debugging.
Trial
A Trial wires one or more Populations and a
Timer together and runs the simulation.
Constructor
Trial$new(
name = "my_trial",
seed = 42, # optional; set for reproducibility
timer = my_timer,
population = list(pop_a, pop_b),
conditions = list(cond1, cond2) # list of Condition objects
)The population argument is a list of
Population objects, one per arm. The arm labels in each
Population’s name field must match the
arm identifiers in the Timer’s timelist.
What run() does
Calling trial$run() executes the following loop:
- Iterate over each unique time
tin theTimer’s timelist, in ascending order. - For each arm, apply
set_enrolled()andset_dropped()according to that arm’s timepoint entry. - Build a combined snapshot by row-binding all arms’ enrolled
subjects. Four columns are appended by
Trial:
| Column | Meaning |
|---|---|
enroll_time |
Calendar time the subject was enrolled (NA if not yet
enrolled) |
drop_time |
Calendar time the subject dropped out (NA if still
active) |
measurement_time |
enroll_time + readout_time |
time |
The current clock time t
|
- Evaluate each
Conditionintrial$conditionsby callingcond$check_conditions(snapshot, t). Each condition applies its filter, cooldown, and max-trigger guards independently. - If any condition fired, store the snapshot in
locked_data[["time_t"]]and the analysis outputs inresults[["time_t"]].
locked_data and results
locked_data is a named list of snapshots, one per unique
timepoint at which at least one analysis fired. Each snapshot is the
full subject-level data frame at that moment — useful for auditing which
subjects were enrolled, computing post-hoc statistics, or debugging an
analysis function.
results has the same time-indexed structure, but each
element is itself a named list — one entry per condition that fired at
that timepoint. The value is whatever your analysis function
returned.
A minimal end-to-end example
set.seed(7)
# Two small populations
pop_a <- Population$new("A", data = vector_to_dataframe(rnorm(10)))
pop_b <- Population$new("B", data = vector_to_dataframe(rnorm(10, mean = 0.5)))
# Timer: enroll in two waves, final readout at time 15
tm <- Timer$new("tm")
tm$add_timepoint(time = 1, arm = "A", enroller = 5L, dropper = 0L)
tm$add_timepoint(time = 1, arm = "B", enroller = 5L, dropper = 0L)
tm$add_timepoint(time = 3, arm = "A", enroller = 5L, dropper = 0L)
tm$add_timepoint(time = 3, arm = "B", enroller = 5L, dropper = 0L)
tm$add_timepoint(time = 15, arm = "A", enroller = 0L, dropper = 0L)
tm$add_timepoint(time = 15, arm = "B", enroller = 0L, dropper = 0L)
# Trigger: fire at calendar time 15
cond_final <- Condition$new(
where = rlang::quos(.data$time %in% 15),
analysis = function(df, current_time) {
enrolled <- subset(df, !is.na(enroll_time))
data.frame(
n = nrow(enrolled),
mean_A = mean(enrolled$data[enrolled$arm == "A"]),
mean_B = mean(enrolled$data[enrolled$arm == "B"])
)
},
name = "final"
)
trial <- Trial$new(name = "demo", seed = 7, timer = tm,
population = list(pop_a, pop_b),
conditions = list(cond_final))
trial$run()
# Inspect results
prettify_results(trial$results)
#> time final.n final.mean_A final.mean_B
#> 1 15 20 0.1039757 1.282517Analysis triggers in depth
Triggers are the mechanism by which rxsim knows when to analyse the data and what to compute. Understanding how they are stored and evaluated is key to writing correct and flexible simulation code.
Why rlang::exprs()?
A trigger condition needs to be a stored expression, not an
immediately evaluated one. When you write
sum(!is.na(enroll_time)) >= 20, R would evaluate it at
the point of definition — before any data exists.
rlang::exprs() wraps the expression in a quoted form so it
is only evaluated later, inside check_conditions(), against
the actual snapshot:
Condition$new() uses rlang::quos() instead
of rlang::exprs(). The difference is that
quos() also captures the calling environment, so
values from the surrounding scope (e.g., target_n) are
available inside the expression when it is evaluated. Use
rlang::quos() in Condition$new(where = ...)
and rlang::exprs() when supplying triggers to
replicate_trial()’s analysis_generators.
The !! (bang-bang) operator
When you want to inject a value from the current R environment into a
stored expression, use !!. Without it, the variable name is
treated as a column name in the snapshot data frame — almost certainly
not what you want:
target_n <- 40
# CORRECT: !! injects the value 40 at definition time
trigger <- rlang::exprs(sum(!is.na(enroll_time)) >= !!target_n)
# WRONG: "target_n" would be looked for as a column name in the snapshot
trigger_bad <- rlang::exprs(sum(!is.na(enroll_time)) >= target_n)This distinction matters when you loop over scenarios with different
sample sizes: each iteration should bake in its own
target_n value via !!.
Columns available in a trigger expression
The trigger expression is evaluated against the snapshot data frame,
which contains all columns from the Population’s
data plus the four columns appended by
Trial:
-
enroll_time— calendar enrollment time (NAif not enrolled) -
drop_time— calendar dropout time (NAif not dropped) -
measurement_time—enroll_time + readout_time -
time— current clock time -
arm— arm label
Any user-defined endpoint column (e.g., data,
response, tte) is also available.
trigger_by_calendar() and trigger_by_fraction()
Note: These helper functions are being refactored in an upcoming release to accept a
Trialobject directly. In the meantime, useCondition$new()(shown below) to build triggers.
Condition$new() with a calendar-time filter is the
recommended approach for pre-planned analyses:
# Fire at calendar time 24
cond_final <- Condition$new(
where = rlang::quos(.data$time %in% 24),
analysis = function(df, current_time) {
data.frame(n_enrolled = sum(!is.na(df$enroll_time)))
},
name = "final_analysis"
)
trial$conditions <- append(trial$conditions, list(cond_final))For information-fraction-based interims, filter on enrolled count:
# Interim at 50% enrollment (target = 100)
cond_interim <- Condition$new(
where = rlang::quos(sum(!is.na(enroll_time)) >= 50),
analysis = function(df, current_time) {
enrolled <- subset(df, !is.na(enroll_time))
data.frame(n = nrow(enrolled), time = current_time)
},
name = "interim_50pct",
max_triggers = 1L
)
trial$conditions <- append(trial$conditions, list(cond_interim))Custom conditions
You can condition on any expression involving the snapshot columns. For time-to-event endpoints this commonly means waiting for a target number of events rather than a target enrollment count:
# Fire when 30 events have been observed (event = 1, censored = 0)
cond_events <- Condition$new(
where = rlang::quos(sum(event == 1 & !is.na(enroll_time)) >= !!n_events),
analysis = my_tte_analysis,
name = "event_driven_interim"
)
trial$conditions <- append(trial$conditions, list(cond_events))Multiple predicates in where are ANDed together, exactly
as in dplyr::filter():
The analysis function signature
Every analysis function receives two arguments:
-
df: the full snapshot — all arms, all currently enrolled subjects. The condition filter determined whether to fire; the analysis function decides what to compute from the full data. -
current_time: the numeric clock time at which the condition fired.
my_analysis <- function(df, current_time) {
enrolled <- subset(df, !is.na(enroll_time))
data.frame(
fired_at = current_time,
n_ctrl = sum(enrolled$arm == "control"),
n_trt = sum(enrolled$arm == "treatment"),
mean_diff = mean(enrolled$data[enrolled$arm == "treatment"]) -
mean(enrolled$data[enrolled$arm == "control"])
)
}Return values
The analysis function’s return value is stored verbatim in
results. collect_results() handles three
cases:
-
data.frame(recommended) — the standard pattern; one row per trigger event is the convention. -
named
list— coerced to a single-row data frame. -
NULL— silently skipped bycollect_results().
Returning NULL is useful for side-effects-only analyses
(e.g., writing a log entry) or for marking that a trigger condition was
not yet satisfied.
Putting it together: replicate_trial() and run_trials()
Building a single Trial by hand is useful for
exploration, but the purpose of rxsim is operating-characteristic
simulation across many replicates. replicate_trial() +
run_trials() handle this ergonomically.
How replicate_trial() works
For each of the n replicates,
replicate_trial():
- Calls
gen_plan()with yourenrollmentanddropoutfunctions to generate a fresh, stochastic enrollment/dropout schedule. - Builds a new
Timerand loads the schedule viaadd_timepoints(). - Creates a
Conditionobject for each entry inanalysis_generatorsand attaches them to the trial’sconditionslist. - Calls each population generator function to draw fresh subject-level endpoint data.
- Constructs a
Trial$new()wiring theTimer,Populationobjects, andConditionobjects together.
Each replicate therefore has independent endpoint data and independent enrollment timing — both sources of variability that operating characteristic simulations are designed to characterise.
A complete six-step example
set.seed(42)
# Step 1 — design parameters
sample_size <- 30
arms <- c("control", "treatment")
allocation <- c(1, 1)
true_delta <- 0.5
scenario <- data.frame(sample_size = sample_size, true_delta = true_delta)
# Step 2 — population generators (called fresh per replicate)
population_generators <- list(
control = function(n) vector_to_dataframe(rnorm(n)),
treatment = function(n) vector_to_dataframe(rnorm(n, mean = true_delta))
)
# Step 3 — enrollment and dropout (inter-arrival functions)
enrollment <- function(n) rexp(n, rate = 2)
dropout <- function(n) rexp(n, rate = 0.02)
# Step 4 — analysis trigger: fire when full enrollment is reached
analysis_generators <- list(
final = list(
trigger = rlang::exprs(
sum(!is.na(enroll_time)) >= !!sample_size
),
analysis = function(df, current_time) {
enrolled <- subset(df, !is.na(enroll_time))
data.frame(
scenario,
fired_at = current_time,
n_ctrl = sum(enrolled$arm == "control"),
n_trt = sum(enrolled$arm == "treatment"),
mean_ctrl = mean(enrolled$data[enrolled$arm == "control"]),
mean_trt = mean(enrolled$data[enrolled$arm == "treatment"])
)
}
)
)
# Step 5 — create and run replicates
trials <- replicate_trial(
trial_name = "concepts_demo",
sample_size = sample_size,
arms = arms,
allocation = allocation,
enrollment = enrollment,
dropout = dropout,
analysis_generators = analysis_generators,
population_generators = population_generators,
n = 10
)
run_trials(trials)
#> [[1]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: concepts_demo_1
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[2]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: concepts_demo_2
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[3]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: concepts_demo_3
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[4]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: concepts_demo_4
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[5]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: concepts_demo_5
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[6]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: concepts_demo_6
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[7]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: concepts_demo_7
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[8]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: concepts_demo_8
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[9]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: concepts_demo_9
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
#>
#> [[10]]
#> <Trial>
#> Public:
#> clone: function (deep = FALSE)
#> conditions: list
#> initialize: function (name, seed = NULL, timer = NULL, population = list(),
#> locked_data: list
#> name: concepts_demo_10
#> population: list
#> results: list
#> run: function ()
#> seed: NULL
#> timer: Timer, R6
# Step 6 — collect and inspect
results <- collect_results(trials)
results
#> replicate timepoint analysis sample_size true_delta fired_at n_ctrl n_trt
#> 1 1 14.85864 final 30 0.5 14.85864 15 15
#> 2 2 13.65704 final 30 0.5 13.65704 15 15
#> 3 3 17.64924 final 30 0.5 17.64924 15 15
#> 4 4 11.80808 final 30 0.5 11.80808 15 15
#> 5 5 12.91792 final 30 0.5 12.91792 15 15
#> 6 6 15.97475 final 30 0.5 15.97475 15 15
#> 7 7 14.76681 final 30 0.5 14.76681 15 15
#> 8 8 13.66482 final 30 0.5 13.66482 15 15
#> 9 9 18.81521 final 30 0.5 18.81521 15 15
#> 10 10 11.84402 final 30 0.5 11.84402 15 15
#> mean_ctrl mean_trt
#> 1 -0.2426125 0.73924372
#> 2 0.1855949 0.77785489
#> 3 0.2729714 0.52646154
#> 4 -0.2590078 0.50676828
#> 5 0.1494857 0.90866496
#> 6 0.4779952 -0.03207007
#> 7 0.1935956 0.05437164
#> 8 0.2760705 0.16748723
#> 9 -0.5170736 0.59219044
#> 10 0.4187062 0.42788315collect_results() and the analysis= filter
When a trial has multiple named analyses (e.g., an interim and a
final), collect_results() returns rows for all of them. Use
the analysis argument to restrict to a specific name:
# Only the final analysis rows
collect_results(trials, analysis = "final")Each row in the output corresponds to one replicate firing one named
analysis. The replicate, timepoint, and
analysis columns identify the provenance of every result
row, making it straightforward to compare interim and final results
within the same replicate or aggregate across replicates.
Next steps
- Getting Started — end-to-end walkthrough of the six-step pattern shown above, with commentary on each design choice.
-
Enrollment & Dropout
Modeling — choose between stochastic (
gen_plan) and piecewise-constant (gen_timepoints) schedules. - Example 1 through Example 8 — progressively complex designs: correlated endpoints, time-to-event, multi-arm dose-finding, subgroup analyses, and Bayesian Go/No-Go rules.