R/glmerNULISAseq_predict.R
glmerNULISAseq_predict.RdFits generalized linear mixed effects model to each target in the NULISAseq data set,
using univariate targets as predictors in the model, supporting various response distributions through the family parameter.
Outputs coefficients, odds ratios, z-statistics, unadjusted and adjusted p-values. Uses lme4 package.
glmerNULISAseq_predict(
data,
sampleInfo,
sampleName_var,
response_var,
modelFormula_fixed,
modelFormula_random,
reduced_modelFormula_fixed = NULL,
reduced_modelFormula_random = NULL,
exclude_targets = NULL,
exclude_samples = NULL,
target_subset = NULL,
sample_subset = NULL,
family = binomial(),
return_model_fits = FALSE,
control = lme4::glmerControl(optimizer = "bobyqa")
)A matrix of normalized NULISAseq data
with targets in rows, samples in columns.
Row names should be the target names, and column names are the sample names.
It is assumed that data has already been transformed
using log2(x + 1) for each NULISAseq normalized count value x.
A data frame with sample metadata including the response
variable and covariates. Rows are samples,
columns are sample metadata variables. Generalized linear mixed effects model will
only be done on the samples in sampleInfo, or a subset of those samples as
specified using arguments exclude_samples or sample_subset.
sampleInfo should have a column for each
variable included in the generalized linear mixed effect models. String variables will
be automatically treated as factors, and numeric variables will be
treated as numeric.
The name of the column of sampleInfo that matches
the column names of data. This variable will be used to merge the
target expression data with the sample metadata.
The name of the column of sampleInfo specifying the response variable.
A string that represents the fixed effects part of the model
formula used for the generalized linear mixed effects model.
The main effect of target expression will be automatically added as a predictor.
Any interactions need to be specified in the model formula as "covariate * target".
For example, when response_var = disease, family = binomial(), modelFormula_fixed =
"age + sex + plate" tests for differences in the log-odds of the binary outcome
(disease status) as a function of age, sex, and plate. modelFormula_fixed =
"sex * target + age + plate" includes both main and interaction
effects for sex and target expression. See ?glmer().
A string that represents the random effects part of the model
formula on the used for the generalized linear mixed effects model.
For example modelFormula_random = "(1|participant_ID)"
creates a subject specific random intercept, where the variable
participant_ID (a column in sampleInfo data frame) denotes
repeated measures on the same subject. For subject-specific random intercept and
slopes (not recommended when time is categorical),
use modelFormula_random = "(1 + time|participant_ID)". For random
subject nested within plate (which may be useful when analyzing
a large number of plates together), use modelFormula_random =
"(1|plate_ID:participant_ID)".
See ?glmer().
Optional reduced model formula
for generalized linear mixed effects that contains only a subset of the terms in modelFormula.
This could be an empty string if the full model contains only one term.
The reduced model serves as null model for a likelihood ratio test
(LRT, which is a Chi-square test) using anova().
This could be useful for testing the overall significance of factor
variables with more than 2 levels, for example, testing the overall significance
of a categorical time effect. The reduced model uses the same random effects
as specified in modelFormula_random.
Optional reduced random effects formula.
If not specified, the reduced model will use the same random effects structure
as the full model. Specifying this allows testing the significance of random effects
components. For example, to test if participant random effects are needed, you could
specify reduced_modelFormula_random = "(1|plate_ID)" when the full model has
modelFormula_random = "(1|plate_ID:participant_ID)".
A vector of target names for targets that will be excluded from the generalized linear mixed effects models as predictors. Internal control targets, for example, should probably always be excluded.
A vector of sample names for samples that will be excluded from the generalized linear mixed effects models. External control wells (IPCs, NCs, SC,) should usually be excluded.
Overrides exclude_targets. A vector of target names for targets that will be included in the generalized linear mixed effects models as predictors.
Overrides exclude_samples. A vector of sample names for samples that will be included in the generalized linear mixed effects models.
A family object for the glm() function specifying the distribution and link function.
binomial(link = "logit")For binary response data (default). Reports odds ratios (OR).
gaussian(link = "identity")For continuous normally-distributed data. Reports coefficients.
poisson(link = "log")For count data. Reports rate ratios (RR).
Gamma(link = "inverse")For positive continuous data with constant coefficient of variation. Reports inverse ratios (IR).
inverse.gaussian(link = "1/mu^2")For positive continuous data with variance increasing with mean^3. Reports inverse ratios (IR).
quasibinomial(link = "logit")For overdispersed binary data. Reports odds ratios (OR).
quasipoisson(link = "log")For overdispersed count data. Reports rate ratios (RR).
quasi(link = "identity", variance = "constant")For custom quasi-likelihood models.
The function automatically selects appropriate test statistics (z/t-values) and effect size measures based on the family.
(see ?family())
Logical TRUE or FALSE (default).
Should a list of the model fits be returned? Might be useful for more
detailed analyses and plotting. However, also requires using more memory.
A list of control parameters for glmer model fitting,
created by lme4::glmerControl(). Defaults to glmerControl(optimizer = "bobyqa")
which often helps with convergence issues. Other useful optimizers include "nloptwrap".
Additional control parameters can help with convergence:
optCtrl = list(maxfun = 100000) - Increase maximum number of function evaluations
calc.derivs = FALSE - Skip derivative calculations if having convergence issues
check.conv.grad = FALSE - Skip gradient convergence checks if needed
See ?lme4::glmerControl for all available options.
A list including the following:
A data frame with rows corresponding to targets and columns
corresponding to estimated model coefficients, effect sizes, standard errors, test statistics, unadjusted p-values,
Bonferroni adjusted p-values, and Benjamini-Hochberg false discovery rate
adjusted p-values (see ?p.adjust()).
A data frame with rows corresponding to targets and columns with likelihood ratio
test statistics including Chi-square statistic, degrees of freedom, unadjusted p-values,
Bonferroni adjusted p-values, and Benjamini-Hochberg false discovery rate
adjusted p-values. (Only included when reduced_modelFormula is specified.)
A list of length equal to number of targets containing
the model fit output from glm(). Only returned when
return_model_fits=TRUE.
A list of length equal to number of targets containing
the reduced model fit output from anova(). Only returned when
return_model_fits=TRUE and reduced_modelFormula is specified.