Linear Regression Model for NULISAseq Data - targets as outcome

Fits linear regression model to each target in the NULISAseq data set, using univariate targets as outcome in the model. Outputs coefficients, t-statistics, unadjusted and adjusted p-values.

lmNULISAseq(
  data,
  sampleInfo,
  sampleName_var,
  modelFormula,
  reduced_modelFormula = NULL,
  exclude_targets = NULL,
  exclude_samples = NULL,
  target_subset = NULL,
  sample_subset = NULL,
  return_model_fits = FALSE,
  analysis_context = NULL
)

Arguments

data: A matrix of normalized NULISAseq data with targets in rows, samples in columns. Row names should be the target names, and column names are the sample names. It is assumed that data has already been transformed using log2(x + 1) for each NULISAseq normalized count value x.
sampleInfo: A data frame with sample metadata. Rows are samples, columns are sample metadata variables. Differential abundance analysis will only be done on the samples in sampleInfo, or a subset of those samples as specified using arguments exclude_samples or sample_subset. sampleInfo should have a column for each variable included in the linear regression models. String variables will be automatically treated as factors, and numeric variables will be treated as numeric.
sampleName_var: The name of the column of sampleInfo that matches the column names of data. This variable will be used to merge the target expression data with the sample metadata.
modelFormula: A string that represents the right hand side of the model formula (everything after the ~) used for the linear model. For example modelFormula = "disease + age + sex + plate" test for differences in target expression by disease group, adjusted for age, sex, and plate. modelFormula = "disease * age + sex + plate" includes both main and interaction effects for disease and age. See ?lm().
reduced_modelFormula: Optional reduced model formula that contains only a subset of the terms in modelFormula. The reduced model serves as null model for an F-test using anova(). This could be useful for testing the overall significance of factor variables with more than 2 levels.
exclude_targets: A vector of target names for targets that will be excluded from the differential abundance analysis. Internal control targets, for example, should probably always be excluded.
exclude_samples: A vector of sample names for samples that will be excluded from the differential abundance analysis. External control wells (IPCs, NCs, SC,) should usually be excluded.
target_subset: Overrides exclude_targets. A vector of target names for targets that will be included in the differential abundance analysis.
sample_subset: Overrides exclude_samples. A vector of sample names for samples that will be included in the differential abundance analysis.
return_model_fits: Logical TRUE or FALSE (default). Should a list of the model fits be returned? Might be useful for more detailed analyses and plotting. However, also requires using more memory.
analysis_context: Optional string to provide context in error messages. Useful when calling lmNULISAseq from different analyses (e.g., "plate effect test").

Value

A list including the following:

modelStats: A data frame with rows corresponding to targets and columns corresponding to estimated model coefficients, unadjusted p-values, Bonferroni adjusted p-values, and Benjamini-Hochberg false discovery rate adjusted p-values (see ?p.adjust()).
modelFits: A list of length equal to number of targets containing the model fit output from lm(). Only returned when return_model_fits=TRUE.
Fstats: A data frame with rows corresponding to targets and columns.