Write NULISAseq data in long format Excel file — writeNULISAseq • NULISAseqR

Reads in one or more NULISAseq XML files. Output is a long-format Excel file where each row corresponds to a particular sample-target combination.

writeNULISAseq(
  xml_files,
  dataDir,
  target_info_file = NULL,
  output_filename,
  Panel = "200-plex Inflammation v1",
  PanelLotNumber = "",
  plateIDs = NULL,
  sample_info_file = NULL,
  sample_info_file_variables = NULL,
  sample_group_covar = "SAMPLE_MATRIX",
  ICs = NULL,
  IPC_string = NULL,
  SC_string = NULL,
  Bridge_string = NULL,
  Calibrator_string = NULL,
  NC_string = NULL,
  include_IPC = FALSE,
  include_SC = TRUE,
  include_Bridge = TRUE,
  include_Calibrator = FALSE,
  include_NC = FALSE,
  include_unnorm_counts = FALSE,
  include_IC_counts = FALSE,
  excludeSamples = NULL,
  excludeTargets = NULL,
  interPlateNorm_method = "IPC",
  IN_samples = NULL,
  replaceNA = TRUE,
  TAP = TRUE,
  output_TAP_AQ = FALSE,
  replace_cal_blank_zeros = FALSE,
  replace_zeros_with_NA = TRUE,
  metadata = NULL,
  verbose = TRUE
)

Arguments

xml_files: Vector of file names (character strings). Files should be in order of the desired plateID variable, unless that is otherwise defined.
dataDir: Data directory where xml_files, target_info_file, and sample_info_file reside (character string).
target_info_file: Optional. Path and filename for the target info CSV file (character string). Must include columns for TargetName (matches the targets in XML files), AlamarTargetID, UniProtID, and ProteinName. Only targets in the target_info_file will be output in the CSV file. If target_info_file is not provided, the function will use information from Barcode A in the xml files.
output_filename: Filename for output xlsx file.
Panel: Name of multi-plex panel. Default is '200-plex Inflammation v1'
PanelLotNumber: The panel lot number. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.
plateIDs: A vector of plate IDs. If `NULL`, default is to number plates from 01 to total number of plates based on the order of xml_files. Passed to `readNULISAseq()` function and output as a column in the data file.
sample_info_file: Optional. Path and filename for the sample annotation CSV file. Must include columns "plateID" and "sampleName" which correspond to the plateID (matching the plateIDs input to this function) and sampleName in the `readNULISAseq()` `samples` data.frame.
sample_info_file_variables: Subset of column names in sample_info_file that will be included in data file output. Other columns will be excluded. Otherwise, if `NULL` (default), all columns will be included.
sample_group_covar: Optional column name in the Barcode B file and samples data matrix output by readNULISAseq that represents subgroups for which detectability will be calculated separately. Default is 'SAMPLE_MATRIX', Function will check first to be sure that the variable is present in the column names of the samples matrix. Can be set to NULL to not use this feature.
ICs: Vector of string(s) or a named list of character string vectors with internal control target names. Default is NULL which will use the target type metadata embedded in the xml file to determine the IC. If non-null, first IC in vector will be used in intra-plate IC-normalization (usually mCherry). ICs will be omitted from output file by default unless include_unnorm_counts and include_IC_counts are both set to TRUE. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.
IPC_string: Optional vector of string(s) or a named list of character string vectors that identifies the sample names of IPC wells (e.g. 'IPC'). This will override the default behavior which is to use the sample 'type' column from Barcode B information. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.
SC_string: Optional vector of string(s) or a named list of character string vectors that identifies the sample names of SC wells (e.g. 'SC'). This will override the default behavior which is to use the sample 'type' column from Barcode B information. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.
Bridge_string: Optional vector of string(s) or a named list of character string vectors that identifies the sample names of Bridge wells (e.g. 'Bridge'). This will override the default behavior which is to use the sample 'type' column from Barcode B information. (Bridge normalization not currently implemented.) Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.
Calibrator_string: Optional vector of character string(s) or a named list of character string vectors that identifies the sample names of Calibrator wells (e.g. 'Calibrator'). This will override the default behavior which is to use the sample 'type' column from Barcode B information. (Calibrator not currently implemented. AQ uses IPC as calibrator.) Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.
NC_string: Optional vector of character string(s) or a named list of character string vectors that identifies the sample names of NC wells (e.g. 'NC'). This will override the default behavior which is to use the sample 'type' column from Barcode B information. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.
include_IPC: Logical. Should IPC samples be included in output? Default is `FALSE`.
include_SC: Logical. Should SC samples be included in output? Default is `TRUE`.
include_Bridge: Logical. Should Bridge samples be included in output? Default is `TRUE`.
include_Calibrator: Logical. Should Calibrator samples be included in output? Default is `FALSE`.
include_NC: Logical. Should NC samples be included in output? Default is `FALSE`.
include_unnorm_counts: Logical. Should unnormalized counts be included as an additional column in output? Default is `FALSE`.
include_IC_counts: Logical. Should IC counts be included in the output? Default is `FALSE`. This only works when include_unnorm_counts=TRUE.
excludeSamples: Optional vector of string(s) or a named list of character string vectors that give sample names to be excluded from the output file. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs. If no sample is to be excluded from a plate, use `NULL` in the list for that plate.
excludeTargets: Optional vector of string(s) or a named list of character string vectors that give target names to be excluded from the output file. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs. If no target is to be excluded from a plate, use `NULL` in the list for that plate.
interPlateNorm_method: Default is "IPC" for inter-plate control normalization. Use "IN" for IPC normalization followed by intensity normalization.
IN_samples: Optional argument passed to `interPlateNorm` function. A list of column names or indices specifying which subset of samples to use for intensity normalization step for each plate in data_list. By default, when this is set to `NULL`, all samples are used for IN.
replaceNA: Logical. Passed to `readNULISAseq()` function. If `TRUE` (default), will replace any missing counts with zero for generating NPQ. (For AQ data, see replace_zeros_with_NA, below.)
TAP: If `TRUE` (default), uses TAP detectability criteria in sample QC which includes more matrix types than non-TAP criteria. However, this function currently only flags samples based on IC median deviation.
output_TAP_AQ: Logical. Default is `FALSE`. If xml files include AQ parameters and this is set to TRUE, the function will output the Excel file format used for TAP projects. File will have 5 tabs including RQ data, AQ data, detectability, quantifiability, and sample information.
replace_cal_blank_zeros: Logical. Default is `FALSE`. If `TRUE`, AQ targets with a zero mean NC value will have the a_yint parameter replaced with the AQMC value, rather than be set to zero.
replace_zeros_with_NA: Logical. Default is `TRUE`. When `TRUE`, will replace and zero AQ data values with NA.
metadata: An optional named list of metadata, such as software package version numbers, that will be added as the last sheet in the Excel file. The list names will be the first column, and the values will form the second column. If `NULL` (default) then no metadata sheet is added.
verbose: Logical. Should function output step completion info. Default is `TRUE`.

Value

Outputs an Excel file.

Details

#' For RQ data, Excel file tabs include RQ data, target detectability and sample information. For AQ data, tabs include RQ data, AQ data, target detectability, target quantifiability, and sample information.

The function takes as input an optional target info file containing target metadata, including Alamar target IDs and protein names, and an optional sample info file containing sample metadata that is desired to appear in the output file.

Sample QC is currently based on the internal control within 40 median criterion. If a sample's IC count falls outside this threshold, it will be assigned a "WARN" status. Otherwise the sample will be assigned a "PASS" status.

Target QC (if AQ data is available) is currently based on the target concentration accuracy within 30 it will be assigned a "WARN" status. Otherwise the target will be assigned a "PASS" status.

The function outputs long format data where each row corresponds to a particular sample-target combination. Total number of rows is number of samples times number of targets. By default the sample control (SC/AQSC) wells are included in the output, and IPC and NC wells are omitted.