Linear Mixed Effect Model for NULISAseq Data - targets as predictor

Fits linear mixed effects model to each target in the NULISAseq data set, using univariate targets as predictors in the model Outcome ~ univariate target predictor + ... Outputs coefficients, t-statistics, unadjusted and adjusted p-values. This approach tests whether each target's expression is associated with the outcome variable, while adjusting for any specified fixed effect covariates and accounting for random effects.

lmerNULISAseq_predict(
  data,
  sampleInfo,
  sampleName_var,
  response_var,
  modelFormula_fixed,
  modelFormula_random,
  reduced_modelFormula_fixed = NULL,
  reduced_modelFormula_random = NULL,
  exclude_targets = NULL,
  exclude_samples = NULL,
  target_subset = NULL,
  sample_subset = NULL,
  return_model_fits = FALSE,
  control = lme4::lmerControl(optimizer = "bobyqa")
)

Arguments

data

A matrix of normalized NULISAseq data with targets used as predictors in rows, samples in columns. Row names should be the target names, and column names are the sample names. It is assumed that data has already been transformed using log2(x + 1) for each NULISAseq normalized count value x.

sampleInfo

A data frame with sample metadata including the response variable and covariates. Rows are samples, columns are sample metadata variables. Linear mixed effect models will only be done on the samples in sampleInfo, or a subset of those samples as specified using arguments exclude_samples or sample_subset.

sampleName_var

The name of the column of sampleInfo that matches the column names of data. This variable will be used to merge the target expression data with the sample metadata.

response_var

The name of the column of sampleInfo specifying the continuous numeric response variable.

modelFormula_fixed

A string that represents the fixed effects part of the model formula used for the linear mixed effects model. The main effect of target expression will be automatically added as a predictor. Any interactions need to be specified in the model formula as "covariate * target". For example, "disease + age + sex + plate" tests for variations of outcome explained by the target predictor, adjusted for disease group, age, sex, and plate. modelFormula = "disease * target + age + sex + plate" includes both main and interaction effects for disease and target expression. See ?lmer().

modelFormula_random

A string that represents the random effects part of the model formula on the used for the linear mixed effects model. For example modelFormula_random = "(1|participant_ID)" creates a subject specific random intercept, where the variable participant_ID (a column in sampleInfo data frame) denotes repeated measures on the same subject. For subject-specific random intercept and slopes (not recommended when time is categorical), use modelFormula_random = "(1 + time|participant_ID)". For random subject nested within plate (which may be useful when analyzing a large number of plates together), use modelFormula_random = "(1|plate_ID:participant_ID)". See ?lmer().

reduced_modelFormula_fixed

Optional reduced model formula for fixed effects that contains only a subset of the terms in modelFormula. This could be an empty string if the full model contains only one term. The reduced model serves as null model for a likelihood ratio test (LRT, which is a Chi-square test) using anova(). This could be useful for testing the overall significance of factor variables with more than 2 levels, for example, testing the overall significance of a categorical time effect. The reduced model uses the same random effects as specified in modelFormula_random.

reduced_modelFormula_random

Optional reduced random effects formula. If not specified, the reduced model will use the same random effects structure as the full model. Specifying this allows testing the significance of random effects components. For example, to test if participant random effects are needed, you could specify reduced_modelFormula_random = "(1|plate_ID)" when the full model has modelFormula_random = "(1|plate_ID:participant_ID)".

exclude_targets

A vector of target names for targets that will be excluded from the linear mixed effect models as predictors. Internal control targets, for example, should probably always be excluded.

exclude_samples

A vector of sample names for samples that will be excluded from the linear mixed effect models. External control wells (IPCs, NCs, SC,) should usually be excluded.

target_subset

Overrides exclude_targets. A vector of target names for targets that will be included in the linear mixed effect models as predictors.

sample_subset

Overrides exclude_samples. A vector of sample names for samples that will be included in the linear mixed effect models.

return_model_fits

Logical TRUE or FALSE (default). Should a list of the model fits be returned? Might be useful for more detailed analyses and plotting. However, also requires using more memory.

control

A list of control parameters for lmer model fitting, created by lme4::lmerControl(). Defaults to lmerControl(optimizer = "bobyqa") which often helps with convergence issues. Other useful optimizers include "nloptwrap". Additional control parameters can help with convergence:

optCtrl = list(maxfun = 100000) - Increase maximum number of function evaluations
calc.derivs = FALSE - Skip derivative calculations if having convergence issues
check.conv.grad = FALSE - Skip gradient convergence checks if needed

See ?lme4::lmerControl for all available options.

Value

A list including the following:

modelStats: A data frame with rows corresponding to targets and columns corresponding to estimated model coefficients, unadjusted p-values, Bonferroni adjusted p-values, and Benjamini-Hochberg false discovery rate adjusted p-values (see ?p.adjust()).
modelFits: A list of length equal to number of targets containing the model fit output from lm(). Only returned when return_model_fits=TRUE.
LRTstats: A data frame with rows corresponding to targets and columns.

Details

Uses lme4 and lmerTest packages.