artemis
modeling.Rmd
library(artemis)
(Adapted from manuscript Espe et al, in review)
A primary purpose of the artemis
package is to
facilitate modeling of qPCR data from eDNA samples. It does this via two
modeling functions: eDNA_lm()
for fixed effects models and
eDNA_lmer()
for mixed effects models. These functions
mirror the semantics of R’s built in lm()
and lme4’s
lmer()
.
Both modeling functions require the following inputs from the data:
A vector of numeric Cq values (quantification cycles), one for each qPCR replicate. Cq values corresponding to non-detections for your assay should be recorded as the threshold value (the default is 40.0 cycles).
The intercept value \(\alpha\) and the slope value \(\beta\) from a standard curve equation associated with the qPCR analysis. This is used to convert the observed Cq values to the corresponding log concentration of eDNA. This conversion occurs internally.
A threshold value of the most cycles which are attempted in qPCR (defaults to 40 cycles).
An example of qPCR data in the correct format for modeling with
artemis
can be viewed by calling eDNA_data
,
which is a data.frame
with Cq values from live car
experiments completed in the California Sacramento-San Joaquin Delta
with Delta Smelt:
head(eDNA_data)
#> Date FilterID TechRep Cq Distance_m Volume_mL Biomass_N
#> 1 2017-08-02 cvp-1-1 1 40.00 50 50 100
#> 2 2017-08-02 cvp-1-1 2 38.13 50 50 100
#> 3 2017-08-02 cvp-1-1 3 37.38 50 50 100
#> 4 2017-08-02 cvp-1-10 1 36.24 40 200 100
#> 5 2017-08-02 cvp-1-10 2 40.00 40 200 100
#> 6 2017-08-02 cvp-1-10 3 40.00 40 200 100
#> StdCrvAlpha_lnForm StdCrvBeta_lnForm
#> 1 21.168 -1.529
#> 2 21.168 -1.529
#> 3 21.168 -1.529
#> 4 21.168 -1.529
#> 5 21.168 -1.529
#> 6 21.168 -1.529
str(eDNA_data)
#> 'data.frame': 180 obs. of 9 variables:
#> $ Date : Date, format: "2017-08-02" "2017-08-02" ...
#> $ FilterID : chr "cvp-1-1" "cvp-1-1" "cvp-1-1" "cvp-1-10" ...
#> $ TechRep : num 1 2 3 1 2 3 1 2 3 1 ...
#> $ Cq : num 40 38.1 37.4 36.2 40 ...
#> $ Distance_m : num 50 50 50 40 40 40 40 40 40 40 ...
#> $ Volume_mL : num 50 50 50 200 200 200 200 200 200 200 ...
#> $ Biomass_N : num 100 100 100 100 100 100 100 100 100 100 ...
#> $ StdCrvAlpha_lnForm: num 21.2 21.2 21.2 21.2 21.2 ...
#> $ StdCrvBeta_lnForm : num -1.53 -1.53 -1.53 -1.53 -1.53 ...
Note that there are no variable levels with missing or
NA
values in these example data. However if there were
NA
values in the input data set, any rows with
NA
s in the data will be dropped when the data is prepped
for modeling. This is because Stan models cannot not take
NA
values. Although NA
values will be
automatically dropped from the data prior to modeling, we recommend
removing NA
values as a separate step prior to modeling.
This allows inspection and potentially correction of the rows with
NA
values. For example,
na_vals = !complete.cases(eDNA_data)
eDNA_data[na_vals,] # visual inspection
eDNA_lm()
Fixed effects models are primarily used with completely randomized experiments without blocking variables. For most observational data or blocked experimental data, mixed effects models are likely more appropriate.
To fit a fixed effects model to the sample eDNA_data
where Distance_m
is the only predictor, we give the
function a model formula and the input data listed above:
model_fit = eDNA_lm(Cq ~ Distance_m,
data = eDNA_data,
std_curve_alpha = 21.2, std_curve_beta = -1.5)
Notice that we provide the standard curve parameters
(std_curve_alpha
and std_curve_beta
as
separate arguments to the function. In cases where there are multiple
standard curve parameters in use in the same dataset (e.g. using data
from multiple labs or experiments), the standard curve parameters can
each be given as vectors. These vectors must be the same length as the
number of rows in the data.
The model functions, similar to lm()
in base R, will
automatically add an intercept term. You can explicitly omit the
intercept if you have a good reason for doing so. Please see
?lm
for a more full description of how to specify linear
models in R.
Full control of the MCMC algorithm can be accomplished by adding
these control arguments to the end of the eDNA_lm*()
call,
which then passes them on to rstan::sampling()
. Available
arguments for MCMC control can be found in the help for
rstan::sampling
.
For example,
model_fit = eDNA_lm(Cq ~ Distance_m,
data = eDNA_data,
std_curve_alpha = 21.2, std_curve_beta = -1.5,
seed = 1234,
chains = 1) # we don't recommend sampling just 1 chain; the default is 4
eDNA_lmer()
Random or mixed effects models are typically used when there are grouping factors which need to be accounted for in the model (e.g. blocking variables, subsamplings from a single filter, etc.).
To fit a model with one or more random effect(s), use the
eDNA_lmer()
function. Random effects are specified using
the same syntax as the lme4
package,
e.g. (1|random effect)
.
For example, to specify a random effect for “Year”,
As with the simulation objects, the model results can be summarized
or plotted with default methods using summary()
and
plot()
, or converted to a data.frame
object
for further manipulation.
Additional arguments can be provided to the plot method, which are
passed to rstan::plot
methods for stanfit
objects. More details are available via ?rstan::plot
.
Matching lme4
convention, random effects are not
included in the default summary()
output. You can view a
summary of the random effects with ranef()
,
ranef(model_fit2)
or by subsetting the stanfit
slot of the model object
with @
, and specifying the random_betas
parameters with the pars
argument:
rstan::summary(model_fit2@stanfit, pars = "rand_betas", probs = c(0.50, 0.025, 0.975))$summary
plot(model_fit2, pars = "rand_betas")
Because the models implemented in artemis
are Bayesian,
you will get the most out of their results when you can work with and
summarize posterior probabilities. Some helpful resources for this are
the Stan
User’s Guide, and the stanfit
objects vignette from the rstan
package.
This is a collection of advice for modeling eDNA data with the
artemis
package.
Center and scale your predictor values: artemis
uses
MCMC to estimate values, and this will be more efficient if the
predictor values are not on vastly different scales. In general, the
MCMC will be the most efficient when the predictors are roughly centered
at 0, and have stdev of 1.
Use priors: The default priors in artemis
follow the
conventions of the rstanarm
package, and are weakly
informative. When the data do not strongly inform the parameter
estimates, the model fit can be improved by specifying stronger
priors.