Summarize simulated data from a makeDataSim object — summary.makeDataSim • endpoints

summary.makeDataSim() computes arm-specific diagnostic summaries for a "makeDataSim" object returned by makeData.

Usage

# S3 method for class 'makeDataSim'
summary(object, ...)

Arguments

object: An object of class "makeDataSim", typically created by makeData.
...: Currently unused. Included for S3 method compatibility.

Value

A list of class "summary.makeDataSim" with components:

target_correlation: The target correlation matrix supplied to makeData(), or NULL in single-endpoint mode.
estimated_correlation_by_arm: A named list of empirical arm-specific correlation matrices.
continuous: A data frame of continuous-endpoint summaries, or NULL if no continuous endpoints were simulated.
binary: A data frame of binary-endpoint summaries, or NULL if no binary endpoints were simulated.
count: A data frame of count-endpoint summaries, or NULL if no count endpoints were simulated.
tte: A data frame of time-to-event summaries, or NULL if no TTE endpoints were simulated.
n_arms: The total number of treatment arms represented in the simulated dataset.

Details

The summary includes:

the target correlation matrix supplied to makeData(),
empirical endpoint correlations within each treatment arm, and
endpoint-specific marginal summaries for continuous, binary, count, and time-to-event outcomes.

These summaries are intended as a quick validation tool for checking that the simulated dataset is broadly consistent with the requested data-generating parameters.

General behavior

The summary method extracts the simulated dataset and metadata stored in the "makeDataSim" object, then computes:

the requested target correlation matrix, as stored in object$meta$correlation_matrix,
the observed Pearson correlation matrix among endpoint columns within each treatment arm, and
endpoint-type-specific summaries based on simple fitted models or direct descriptive estimators.

Arm-specific empirical correlation

For each study arm, the method computes the Pearson correlation matrix of the simulated endpoint columns listed in object$meta$endpoint_names. These are returned as a named list in $estimated_correlation_by_arm, with elements named "arm_0", "arm_1", and so on.

Correlations are computed using cor() and rounded to 3 decimal places.

Continuous endpoints

For each continuous endpoint:

a linear model is fit, lm(y ~ trt) in multi-arm settings and lm(y ~ 1) otherwise,
the control-group intercept is used as the estimated baseline mean,
treatment coefficients are reported as estimated mean shifts, and
arm-specific residual SDs are computed from the residuals of the fitted model.

The returned table includes:

endpoint: Endpoint name, for example Cont_1.
arm: Arm index.
input_baseline_mean: Encoded control-group mean.
input_sd: Encoded SD for that arm.
input_trt_effect: Encoded treatment effect for that arm.
est_baseline_mean: Estimated control-group mean from the fitted model.
est_trt_effect: Estimated mean shift for that arm versus control.
est_resid_sd: Residual SD within that arm.

Binary endpoints

For each binary endpoint:

a logistic regression model is fit using glm(..., family = binomial()),
the control-group intercept is interpreted on the logit scale,
treatment coefficients are reported as estimated log-odds ratios, and
observed arm-specific event probabilities are computed directly as sample means.

The returned table includes:

endpoint: Endpoint name, for example Bin_1.
arm: Arm index.
input_baseline_prob: Encoded control-group probability.
input_trt_logOR: Encoded treatment effect on the log-odds scale.
input_trt_prob: Encoded arm-specific probability.
est_baseline_prob: Estimated control-group probability from the fitted model.
est_trt_logOR: Estimated treatment log-odds ratio for that arm versus control.
est_prob: Observed arm-specific event proportion.

Count endpoints

For each count endpoint:

a negative binomial model is fit using MASS::glm.nb(),
the control-group intercept is exponentiated to obtain the estimated baseline mean,
treatment coefficients are reported as estimated log rate-ratios,
the fitted dispersion parameter is reported, and
observed arm-specific means and structural zero proportions are computed directly.

The returned table includes:

endpoint: Endpoint name, for example Int_1.
arm: Arm index.
input_baseline_mean: Specified control-group mean.
input_trt_logRR: Specified treatment effect on the log rate-ratio scale.
input_trt_mean: Specified arm-specific mean count.
input_size: Specified negative-binomial size parameter.
input_p_zero: Specified structural zero probability.
est_baseline_mean: Estimated control-group mean from the fitted model.
est_trt_logRR: Estimated treatment log rate-ratio for that arm versus control.
est_size: Estimated negative-binomial size parameter.
obs_mean: Observed arm-specific mean count.
obs_p0: Observed proportion of zeros in that arm.

Time-to-event endpoints

For each time-to-event endpoint:

a Cox proportional hazards model is fit using survival::coxph(),
treatment coefficients are reported as estimated log hazard-ratios,
observed arm-specific event proportions are reported, and
an arm-specific exponential maximum likelihood estimate of the event rate is computed as $\hat\lambda = \sum_i \delta_i / \sum_i t_i$, where $t_i$ is observed follow-up time and $\delta_i$ is the event indicator.

The returned table includes:

endpoint: Endpoint name, for example TTE_1.
arm: Arm index.
censor_col: Corresponding event-indicator column name, for example Status_1.
input_baseline_rate: Requested control-group exponential event rate.
input_trt_logHR: Requested treatment effect on the log hazard-ratio scale.
est_trt_logHR: Estimated log hazard-ratio from the Cox model.
input_trt_HR: Requested treatment effect on the hazard-ratio scale.
est_trt_HR: Estimated hazard-ratio from the Cox model.
obs_event_rate: Observed event proportion in that arm.
exp_rate: Arm-specific exponential MLE event-rate estimate.

Interpretation

Because simulations are finite-sample and may involve nonlinear marginal transformations, censoring, administrative censoring, and fatal/non-fatal event logic, the estimated summaries will generally not match the requested inputs exactly. The summary output is therefore best viewed as a diagnostic check rather than an exact recovery target.

Examples

## Continuous + binary example
ep1 <- list(
  endpoint_type = "continuous",
  baseline_mean = 10,
  sd            = 2,
  trt_effect    = -1
)

ep2 <- list(
  endpoint_type = "binary",
  baseline_prob = 0.30,
  trt_prob      = 0.45
)

R2 <- corr_make(
  num_endpoints = 2,
  values = rbind(c(1, 2, 0.2))
)

sim_obj <- makeData(
  correlation_matrix    = R2,
  sample_size_per_group = 500,
  SEED                  = 1,
  endpoint_details      = list(ep1, ep2)
)

ss <- summary(sim_obj)

## Display top-level structure
ss$n_arms
#> [1] 2
ss$target_correlation
#>      [,1] [,2]
#> [1,]  1.0  0.2
#> [2,]  0.2  1.0
ss$estimated_correlation_by_arm
#> $arm_0
#>        Cont_1 Bin_1
#> Cont_1  1.000 0.149
#> Bin_1   0.149 1.000
#> 
#> $arm_1
#>        Cont_1 Bin_1
#> Cont_1  1.000 0.143
#> Bin_1   0.143 1.000
#> 

## Endpoint-specific summaries
ss$continuous
#>   endpoint arm input_baseline_mean input_sd input_trt_effect est_baseline_mean
#> 1   Cont_1   0                  10        2                0           10.2236
#> 2   Cont_1   1                  10        2               -1           10.2236
#>   est_trt_effect est_resid_sd
#> 1       0.000000     1.945490
#> 2      -1.341728     1.975195
ss$binary
#>   endpoint arm input_baseline_prob input_trt_logOR input_trt_prob
#> 1    Bin_1   0                 0.3       0.0000000           0.30
#> 2    Bin_1   1                 0.3       0.6466272           0.45
#>   est_baseline_prob est_trt_logOR est_prob
#> 1             0.332     0.0000000    0.332
#> 2             0.332     0.4823075    0.446

## Time-to-event example
ep_tte <- list(
  endpoint_type  = "tte",
  baseline_rate  = 1 / 24,
  trt_effect     = log(0.8),
  censoring_rate = 1 / 216,
  fatal_event    = TRUE
)

sim_tte <- makeData(
  correlation_matrix    = NULL,
  sample_size_per_group = 1000,
  SEED                  = 2,
  endpoint_details      = list(ep_tte)
)

summary(sim_tte)$tte
#>   endpoint arm censor_col input_baseline_rate input_trt_logHR input_trt_HR
#> 1    TTE_1   0   Status_1          0.04166667       0.0000000          1.0
#> 2    TTE_1   1   Status_1          0.04166667      -0.2231436          0.8
#>   est_trt_logHR est_trt_HR obs_event_rate   exp_rate
#> 1     0.0000000  1.0000000          0.884 0.04030938
#> 2    -0.1968895  0.8212814          0.871 0.03303419

Summarize simulated data from a `makeDataSim` object

Usage

Arguments

Value

Details

General behavior

Arm-specific empirical correlation

Continuous endpoints

Binary endpoints

Count endpoints

Time-to-event endpoints

Interpretation

See also

Examples