Simulate eDNA data

sim_eDNA_lm(
  formula,
  variable_list,
  betas,
  sigma_ln_eDNA,
  std_curve_alpha,
  std_curve_beta,
  n_sim = 1L,
  upper_Cq = 40,
  prob_zero = 0.08,
  X = expand.grid(variable_list),
  verbose = FALSE,
  cache_dir = tools::R_user_dir("artemis", "cache")
)

sim_eDNA_lmer(
  formula,
  variable_list,
  betas,
  sigma_ln_eDNA,
  sigma_rand,
  std_curve_alpha,
  std_curve_beta,
  n_sim = 1L,
  upper_Cq = 40,
  prob_zero = 0.08,
  X = expand.grid(variable_list),
  verbose = FALSE,
  cache_dir = tools::R_user_dir("artemis", "cache")
)

Arguments

formula: a model formula, e.g. y ~ x1 + x2. For sim_eDNA_lmer, random intercepts can also be provided, e.g. ( 1 | rep ) .
variable_list: a named list, with the levels that each variable can take. Please note that the variables listed in the formula, including the response variable, must be present in the variable_list or in the X design matrix. Extra variables, i.e. variables which do not occur in the formula, are ignored.
betas: numeric vector, the beta for each variable in the design matrix
sigma_ln_eDNA: numeric, the measurement error on ln[eDNA].
std_curve_alpha: the alpha value for the formula for converting between log(eDNA concentration) and CQ value
std_curve_beta: the beta value for the formula for converting between log(eDNA concentration) and CQ value
n_sim: integer, the number of cases to simulate
upper_Cq: numeric, the upper limit on CQ detection. Any value of log(concentration) which would result in a value greater than this limit is instead recorded as the limit.
prob_zero: numeric, between 0 and 1. The probability of seeing a non-detection (i.e., a "zero") via the zero-inflated mechanism. Defaults to 0.08.
X: optional, a design matrix. By default, this is created from the variable_list using expand.grid(), which creates a balanced design matrix. However, the user can provide their own X as well, in which case the variable_list is ignored. This allows users to provide an unbalanced design matrix.
verbose: logical, when TRUE output from rstan::sampling is written to the console.
cache_dir: the cache directory where pre-compiled models are stored. Defaults to the output of tools::R_user_dir("artemis", "cache")
sigma_rand: numeric vector, the stdev for the random effects. There must be one sigma per random effect specified

Value

S4 object of class "eDNA_simulation_{lm/lmer}" with the following slots:

ln_conc matrix: the simulated log(concentration)
Cq_star matrix: the simulated CQ values, including the measurement error
formula: the formula for the simulation
variable_levels: named list, the variable levels used for the simulation
betas: numeric vector, the betas for the simulation
x: data.frame, the design matrix
std_curve_alpha numeric: the alpha for the std curve conversion
std_curve_beta numeric: the alpha for the std curve conversion
upper_Cq: the upper limit for CQ

Details

These functions allow for computationally efficient simulation of Cq values from a hypothetical eDNA sampling experiment via a series of effect sizes (betas) on a number of predictor or variable levels (variable_levels). The mechanism for this model is described in detail in the artemis "Getting Started" vignette.

The simulation functions call to specialized functions which are written in Stan and are compiled to provide speed. This also allows the simulation functions and the modeling functions to reflect the same process at the code level.

Diagnosing "unrealistic" simulations

Users will find that sometimes the simulationed response (i.e. Cq values) produced by this function are not similar to expected data collected from a sampling experiment. This circumstance suggests that there is a mismatch between the assumptions of the model and the data generating process in the field. For these circumstances, we suggest:

Check that the betas provided are the effect sizes on the predictor on the log[eDNA concentration], and not the Cq values.
Check that the variable levels provided are representative of real-world circumstances. For example, a sample volume of 0 ml is not possible.
Verify the values for the standard curve alpha and beta. These are specific to each calibration for the lab, so it is important that you use the same conversion between Cq values and log[eDNA concentration] as the comparison data.

Author

Matt Espe

Examples

# \donttest{
## Includes extra variables
vars = list(Intercept = -10.6,
            distance = c(0, 15, 50),
            volume = c(25, 50),
            biomass = 100,
            alive = 1,
            tech_rep = 1:10,
            rep = 1:3, Cq = 1)

## Intercept only
ans = sim_eDNA_lm(Cq ~ 1, vars,
                      betas = c(intercept = -15),
                      sigma_ln_eDNA = 1e-5,
                      std_curve_alpha = 21.2, std_curve_beta = -1.5)
#> Error: There are no sampler diagnostics when fixed_param = TRUE.

print(ans)
#> Error: object 'ans' not found

ans = sim_eDNA_lm(Cq ~ distance + volume, vars,
                  betas = c(intercept = -10.6, distance = -0.05, volume = 0.1),
                  sigma_ln_eDNA = 1, std_curve_alpha = 21.2, std_curve_beta = -1.5)
#> Error: There are no sampler diagnostics when fixed_param = TRUE.
# }