Parallelizing functions of the decider package • decider

Four of the main functions of the decider package come with a built-in parallelization using the foreach package, namely, the functions:

scenario_list_jointBLRM() and scenario_list_covariate_jointBLRM()
sim_jointBLRM() and sim_covariate_jointBLRM()

Leveraging this capability requires the registration of a parallel bakcend by the user. The present vignette will provide some basic examples how such a backend can be registered. A similar vignette is provided by the bhmbasket package, which inspired the creation of the present vignette.

Simple parallelization

First, the following packages are loaded

library(decider)
library(doFuture)

With the doFuture, we register a simple parallel backend with a given number of cores, set by the argument n.cores. For the purpose of this vignette, we will use a single core due to technical reasons, but when running the code locally, one can easily change this depending on the number of cores available.

#specify number of cores -- here, only 1 for a sequential execution
n.cores <- 1

#register backend for parallel execution and plan multisession
doFuture::registerDoFuture()
future::plan(future::multisession, workers=n.cores)

To figure out how many cores are available, one can use e.g.

parallel::detectCores()
#> [1] 8

Following the initialization of the parallel backend, you can simply run one of the four parallelized functions from the decider package. They will automatically leverage the number of cores registered.

As an example, a simple simulation could be accomplished by:

sim_result <- sim_jointBLRM(
  active.mono1.a = TRUE,
  active.combi.a = TRUE,

  doses.mono1.a = c(2, 4, 8),
  doses.combi.a = rbind(
    c(2, 4, 8,  2, 4, 8),
    c(1, 1, 1,  2, 2, 2)
  ),
  dose.ref1 = 8,
  dose.ref2 = 2,
  start.dose.mono1.a = 2,
  start.dose.combi.a1 = 2,
  start.dose.combi.a2 = 1,

  tox.mono1.a = c(0.2, 0.3, 0.5),
  tox.combi.a = c(0.3, 0.4, 0.6,  0.4, 0.5, 0.7),
  max.n.mono1.a = 12,
  max.n.combi.a = 24,
  cohort.size = 3,
  cohort.queue = c(1, 1, 1, rep(c(1, 3), times = 100)),
  n.studies = 3
)

print(sim_result, quote = F)
#> $`results combi.a`
#>                              underdose target dose overdose max n reached before MTD all doses too toxic
#> Number of trials             0         0           0        0                        3                  
#> Percentage                   0         0           0        0                        100                
#> Percentage not all too toxic                                                                            
#> 
#> $`summary combi.a`
#>                     Median      Mean      Min. Max. 2.5%     97.5%
#> #Pat underdose   0.0000000 0.0000000 0.0000000    0 0.00 0.0000000
#> #Pat target dose 3.0000000 3.0000000 3.0000000    3 3.00 3.0000000
#> #Pat overdose    0.0000000 0.0000000 0.0000000    0 0.00 0.0000000
#> #Pat (all)       3.0000000 3.0000000 3.0000000    3 3.00 3.0000000
#> #DLT underdose   0.0000000 0.0000000 0.0000000    0 0.00 0.0000000
#> #DLT target dose 2.0000000 2.0000000 1.0000000    3 1.05 2.9500000
#> #DLT overdose    0.0000000 0.0000000 0.0000000    0 0.00 0.0000000
#> #DLT (all)       2.0000000 2.0000000 1.0000000    3 1.05 2.9500000
#> % overdose       0.0000000 0.0000000 0.0000000    0 0.00 0.0000000
#> % DLT            0.6666667 0.6666667 0.3333333    1 0.35 0.9833333
#> 
#> $`MTDs combi.a`
#>                  2+1         4+1      8+1      2+2      4+2      8+2     
#> Dose             2+1         4+1      8+1      2+2      4+2      8+2     
#> True P(DLT)      0.3         0.4      0.6      0.4      0.5      0.7     
#> True category    target dose overdose overdose overdose overdose overdose
#> MTD declared (n) 0           0        0        0        0        0       
#> 
#> $`#pat. combi.a`
#>             2+1 4+1 8+1 2+2 4+2 8+2
#> Dose        2+1 4+1 8+1 2+2 4+2 8+2
#> mean #pat   3   0   0   0   0   0  
#> median #pat 3   0   0   0   0   0  
#> min #pat    3   0   0   0   0   0  
#> max #pat    3   0   0   0   0   0  
#> 
#> $`#DLT combi.a`
#>             2+1 4+1 8+1 2+2 4+2 8+2
#> Dose        2+1 4+1 8+1 2+2 4+2 8+2
#> mean #DLT   2   0   0   0   0   0  
#> median #DLT 2   0   0   0   0   0  
#> min #DLT    1   0   0   0   0   0  
#> max #DLT    3   0   0   0   0   0  
#> 
#> $`results mono1.a`
#>                              underdose target dose overdose max n reached before MTD all doses too toxic
#> Number of trials                     0     1.00000        0                        0             2.00000
#> Percentage                           0    33.33333        0                        0            66.66667
#> Percentage not all too toxic         0   100.00000        0                        0                  NA
#> 
#> $`summary mono1.a`
#>                     Median      Mean Min.       Max.      2.5%      97.5%
#> #Pat underdose   0.0000000 0.0000000 0.00  0.0000000 0.0000000  0.0000000
#> #Pat target dose 9.0000000 8.0000000 3.00 12.0000000 3.3000000 11.8500000
#> #Pat overdose    0.0000000 0.0000000 0.00  0.0000000 0.0000000  0.0000000
#> #Pat (all)       9.0000000 8.0000000 3.00 12.0000000 3.3000000 11.8500000
#> #DLT underdose   0.0000000 0.0000000 0.00  0.0000000 0.0000000  0.0000000
#> #DLT target dose 3.0000000 2.3333333 1.00  3.0000000 1.1000000  3.0000000
#> #DLT overdose    0.0000000 0.0000000 0.00  0.0000000 0.0000000  0.0000000
#> #DLT (all)       3.0000000 2.3333333 1.00  3.0000000 1.1000000  3.0000000
#> % overdose       0.0000000 0.0000000 0.00  0.0000000 0.0000000  0.0000000
#> % DLT            0.3333333 0.3055556 0.25  0.3333333 0.2541667  0.3333333
#> 
#> $`MTDs mono1.a`
#>                  2           4           8       
#> Dose             2           4           8       
#> True P(DLT)      0.2         0.3         0.5     
#> True category    target dose target dose overdose
#> MTD declared (n) 1           0           0       
#> 
#> $`#pat. mono1.a`
#>             2 4 8
#> Dose        2 4 8
#> mean #pat   5 3 0
#> median #pat 6 3 0
#> min #pat    3 0 0
#> max #pat    6 6 0
#> 
#> $`#DLT mono1.a`
#>             2                 4                8
#> Dose        2                 4                8
#> mean #DLT   0.666666666666667 1.66666666666667 0
#> median #DLT 1                 2                0
#> min #DLT    0                 0                0
#> max #DLT    1                 3                0
#> 
#> $prior
#>               mean       SD
#> mu_a1   -0.7081851 2.000000
#> mu_b1    0.0000000 1.000000
#> mu_a2   -0.7081851 2.000000
#> mu_b2    0.0000000 1.000000
#> mu_eta   0.0000000 1.121000
#> tau_a1  -1.3862944 0.707293
#> tau_b1  -2.0794415 0.707293
#> tau_a2  -1.3862944 0.707293
#> tau_b2  -2.0794415 0.707293
#> tau_eta -2.0794415 0.707293
#> 
#> $specifications
#>                                  Value     -   
#> seed                             830451341 -   
#> dosing.intervals                 0.16      0.33
#> esc.rule                         ewoc      -   
#> esc.comp.max                     1         -   
#> dose.ref1                        8         -   
#> dose.ref2                        2         -   
#> saturating                       FALSE     -   
#> ewoc.threshold                   0.25      -   
#> start.dose.mono1.a               2         -   
#> esc.step.mono1.a                 2         -   
#> esc.constrain.mono1.a            FALSE     -   
#> max.n.mono1.a                    12        -   
#> cohort.size.mono1.a              3         -   
#> cohort.prob.mono1.a              1         -   
#> mtd.decision.mono1.a$target.prob 0.5       -   
#> mtd.decision.mono1.a$pat.at.mtd  6         -   
#> mtd.decision.mono1.a$min.pat     12        -   
#> mtd.decision.mono1.a$min.dlt     1         -   
#> mtd.decision.mono1.a$rule        2         -   
#> mtd.enforce.mono1.a              FALSE     -   
#> backfill.mono1.a                 FALSE     -   
#> backfill.size.mono1.a            3         -   
#> backfill.prob.mono1.a            1         -   
#> backfill.start.mono1.a           2         -   
#> start.dose.combi.a1              2         -   
#> start.dose.combi.a2              1         -   
#> esc.step.combi.a1                2         -   
#> esc.step.combi.a2                2         -   
#> esc.constrain.combi.a1           FALSE     -   
#> esc.constrain.combi.a2           FALSE     -   
#> max.n.combi.a                    24        -   
#> cohort.size.combi.a              3         -   
#> cohort.prob.combi.a              1         -   
#> mtd.decision.combi.a$target.prob 0.5       -   
#> mtd.decision.combi.a$pat.at.mtd  6         -   
#> mtd.decision.combi.a$min.pat     12        -   
#> mtd.decision.combi.a$min.dlt     1         -   
#> mtd.decision.combi.a$rule        2         -   
#> mtd.enforce.combi.a              FALSE     -   
#> backfill.combi.a                 FALSE     -   
#> backfill.size.combi.a            3         -   
#> backfill.prob.combi.a            1         -   
#> backfill.start.combi.a1          2         -   
#> backfill.start.combi.a2          1         -   
#> 
#> $`Stan options`
#>                 Value
#> chains            4.0
#> iter          13500.0
#> warmup         1000.0
#> adapt_delta       0.8
#> max_treedepth    15.0

Here, only 3 studies are simulated for illustration purposes. Please note, for simulations, one can also initiate storing and re-loading of MCMC results to speed up simulations further when parallelizing on a moderate number of cores (e.g., when one is not using a cluster). For details, refer to the argument working.path of the function sim_jointBLRM().

Parallelization on a cluster

All of the functions of the decider package that allow for parallelization feature a nested foreach loop with two stages. With this, is is possible to leverage the capabilities of multiple cluster nodes that each have multiple cores. Usually, simulations of joint BLRM trials with multiple arms will take a relatively long run time (up to the order of hours) when executed on a regular computer with a moderate amount of cores, so running simulations on a cluster may be desirable to achieve manageable runtimes. Please note that function sim_jointBLRM parallelizes across trials, so, when simulating e.g. 1000 trials, multiple hundreds cores can be used to substantially improve the run time.

In the following, it is illustrated how an exemplary parallel backend on a common SLURM cluster could be registered. We will use the future.batchtools package for this:

library(future.batchtools)

Let us assume we have a cluster available where we want to request 10 nodes with 32 cores per node (320 cores in total). We specify this and further ressources like the job time and memory requested in the following arguments:

#number of processor nodes
n_nodes <- 10
#number of cores per node
n_cpus  <- 32
#job time in hours
walltime_h <- 2
#memory requested in GB
memory_gb  <- 4

With this, we can allocate the parallel backend with the given specifications:

slurm <- tweak(batchtools_slurm,
               template  = system.file('templates/slurm-simple.tmpl', package = 'batchtools'),
               workers   = n_nodes,
               resources = list(
                 walltime  = 60 * 60 * walltime_h,
                 ncpus     = n_cpus,
                 memory    = 1000 * memory_gb))

#register paralel backend on cluster
registerDoFuture()
plan(list(slurm, multisession))

With this specification, when e.g. sim_jointBLRM() is called, the trials that need to be simulated will first be divided evenly across the available nodes, and the trials to be simulated by each node will be distributed across the CPUs the node has. In the above example, when we simulate e.g. 6400 trials using 20 nodes with 32 cores each, each node will receive 320 trials, which will be split across 32 cores, so that each core will need to simulate only 10 of the trials.

After this setup, the sim_jointBLRM() function can be called in the same way as illustrated previously.

Please note that there are some tradeoffs involed depending on the system, e.g. requesting a very large number of cores may take more time.

At last, please note that the function sim_jointBLRM() also supports the use of a working.path argument which allows to enable saving and reloading of MCMC results. If this is executed on a parallel backend, please note that the path must be somewhere in the shared memory, where all CPUs have access. Synchronization across CPUs is carried out using file locks, which may introduce some overhead. It was observed that this feature can potentially still provide significant benefits to the runtime (both on a cluster and on a standard computer), but please be aware that this is highly experimental.