R/lmNULISAseq_predict.R
lmNULISAseq_predict.RdFits linear regression model to each target in the NULISAseq data set, using univariate targets as predictors in the model. Outputs coefficients, t-statistics, unadjusted and adjusted p-values. This approach tests whether each target's expression is associated with the outcome variable, while adjusting for any specified covariates.
lmNULISAseq_predict(
data,
sampleInfo,
sampleName_var,
response_var,
modelFormula,
reduced_modelFormula = NULL,
exclude_targets = NULL,
exclude_samples = NULL,
target_subset = NULL,
sample_subset = NULL,
return_model_fits = FALSE
)A matrix of normalized NULISAseq data
with targets used as predictors in rows, samples in columns.
Row names should be the target names, and column names are the sample names.
It is assumed that data has already been transformed
using log2(x + 1) for each NULISAseq normalized count value x.
A data frame with sample metadata including the response
variable and covariates. Rows are samples, columns are sample metadata variables.
Linear regression models will only be done on the samples in sampleInfo, or a subset of those samples as
specified using arguments exclude_samples or sample_subset.
sampleInfo should have a column for each
variable included in the linear regression models. String variables will
be automatically treated as factors, and numeric variables will be
treated as numeric.
The name of the column of sampleInfo that matches
the column names of data. This variable will be used to merge the
target expression data with the sample metadata.
The name of the column of sampleInfo specifying the continuous numeric response variable.
A string that represents the right hand side of the model formula.
The main effect of target expression
will be automatically added as a predictor. Any interactions need to be specified
in the model formula as "covariate * target".
For example modelFormula = "disease + age + sex + plate" tests for variations of outcome explained by the
target predictor, adjusted for disease group, age, sex, and plate. modelFormula =
"disease * target + age + sex + plate" includes both main and interaction
effects for disease and target expression. See ?lm().
Optional reduced model formula
that contains only a subset of the terms in modelFormula.
The reduced model serves as null model for an F-test using anova().
This could be useful for testing the overall significance of factor
variables with more than 2 levels.
A vector of target names for targets that will be excluded from the linear regression models as predictors. Internal control targets, for example, should probably always be excluded.
A vector of sample names for samples that will be excluded from the linear regression models. External control wells (IPCs, NCs, SC,) should usually be excluded.
Overrides exclude_targets. A vector of target names
for targets that will be included in the linear regression models as predictors.
Overrides exclude_samples. A vector of sample names
for samples that will be included in the linear regression models.
Logical TRUE or FALSE (default).
Should a list of the model fits be returned? Might be useful for more
detailed analyses and plotting. However, also requires using more memory.
A list including the following:
A data frame with rows corresponding to targets and columns
corresponding to estimated model coefficients, unadjusted p-values,
Bonferroni adjusted p-values, and Benjamini-Hochberg false discovery rate
adjusted p-values (see ?p.adjust()).
A list of length equal to number of targets containing
the model fit output from lm(). Only returned when
return_model_fits=TRUE.
A data frame with rows corresponding to targets and columns.