Reads in one or more NULISAseq XML files. Output is a long-format Excel file where each row corresponds to a particular sample-target combination.

writeNULISAseq(
  xml_files,
  dataDir,
  target_info_file = NULL,
  output_filename,
  Panel = "200-plex Inflammation v1",
  PanelLotNumber = "",
  plateIDs = NULL,
  sample_info_file = NULL,
  sample_info_file_variables = NULL,
  sample_group_covar = "SAMPLE_MATRIX",
  ICs = "mCherry",
  IPC_string = NULL,
  SC_string = NULL,
  Bridge_string = NULL,
  Calibrator_string = NULL,
  NC_string = NULL,
  include_IPC = FALSE,
  include_SC = TRUE,
  include_Bridge = TRUE,
  include_Calibrator = FALSE,
  include_NC = FALSE,
  include_unnorm_counts = FALSE,
  include_IC_counts = FALSE,
  excludeSamples = NULL,
  excludeTargets = NULL,
  interPlateNorm_method = "IPC",
  IN_samples = NULL,
  replaceNA = TRUE,
  TAP = TRUE,
  output_TAP_AQ = FALSE,
  replace_cal_blank_zeros = FALSE,
  replace_zeros_with_NA = TRUE,
  metadata = NULL,
  verbose = TRUE
)

Arguments

xml_files

Vector of file names (character strings). Files should be in order of the desired plateID variable, unless that is otherwise defined.

dataDir

Data directory where xml_files, target_info_file, and sample_info_file reside (character string).

target_info_file

Optional. Path and filename for the target info CSV file (character string). Must include columns for TargetName (matches the targets in XML files), AlamarTargetID, UniProtID, and ProteinName. Only targets in the target_info_file will be output in the CSV file. If target_info_file is not provided, the function will use information from Barcode A in the xml files.

output_filename

Filename for output xlsx file.

Panel

Name of multi-plex panel. Default is '200-plex Inflammation v1'

PanelLotNumber

The panel lot number. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.

plateIDs

A vector of plate IDs. If `NULL`, default is to number plates from 01 to total number of plates based on the order of xml_files. Passed to `readNULISAseq()` function and output as a column in the data file.

sample_info_file

Optional. Path and filename for the sample annotation CSV file. Must include columns "plateID" and "sampleName" which correspond to the plateID (matching the plateIDs input to this function) and sampleName in the `readNULISAseq()` `samples` data.frame.

sample_info_file_variables

Subset of column names in sample_info_file that will be included in data file output. Other columns will be excluded. Otherwise, if `NULL` (default), all columns will be included.

sample_group_covar

Optional column name in the Barcode B file and samples data matrix output by readNULISAseq that represents subgroups for which detectability will be calculated separately. Default is 'SAMPLE_MATRIX', Function will check first to be sure that the variable is present in the column names of the samples matrix. Can be set to NULL to not use this feature.

ICs

Vector of string(s) or a named list of character string vectors . Internal control names. Default is "mCherry". First IC in vector will be used in intra-plate IC-normalization (usually mCherry). ICs will be omitted from output file by default. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.

IPC_string

Optional vector of string(s) or a named list of character string vectors that identifies the sample names of IPC wells (e.g. 'IPC'). This will override the default behavior which is to use the sample 'type' column from Barcode B information. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.

SC_string

Optional vector of string(s) or a named list of character string vectors that identifies the sample names of SC wells (e.g. 'SC'). This will override the default behavior which is to use the sample 'type' column from Barcode B information. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.

Bridge_string

Optional vector of string(s) or a named list of character string vectors that identifies the sample names of Bridge wells (e.g. 'Bridge'). This will override the default behavior which is to use the sample 'type' column from Barcode B information. (Bridge normalization not currently implemented.) Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.

Calibrator_string

Optional vector of character string(s) or a named list of character string vectors that identifies the sample names of Calibrator wells (e.g. 'Calibrator'). This will override the default behavior which is to use the sample 'type' column from Barcode B information. (Calibrator not currently implemented. AQ uses IPC as calibrator.) Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.

NC_string

Optional vector of character string(s) or a named list of character string vectors that identifies the sample names of NC wells (e.g. 'NC'). This will override the default behavior which is to use the sample 'type' column from Barcode B information. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs.

include_IPC

Logical. Should IPC samples be included in output? Default is `FALSE`.

include_SC

Logical. Should SC samples be included in output? Default is `TRUE`.

include_Bridge

Logical. Should Bridge samples be included in output? Default is `TRUE`.

include_Calibrator

Logical. Should Calibrator samples be included in output? Default is `FALSE`.

include_NC

Logical. Should NC samples be included in output? Default is `FALSE`.

include_unnorm_counts

Logical. Should unnormalized counts be included as an additional column in output? Default is `FALSE`.

include_IC_counts

Logical. Should IC counts be included in the output? Default is `FALSE`. This is only useful when include_unnorm_counts=TRUE.

excludeSamples

Optional vector of string(s) or a named list of character string vectors that give sample names to be excluded from the output file. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs. If no sample is to be excluded from a plate, use `NULL` in the list for that plate.

excludeTargets

Optional vector of string(s) or a named list of character string vectors that give target names to be excluded from the output file. Can be a single value or vector applied to all plates, or a named list with plate-specific values where names correspond to plate IDs. If no target is to be excluded from a plate, use `NULL` in the list for that plate.

interPlateNorm_method

Default is "IPC" for inter-plate control normalization. Use "IN" for IPC normalization followed by intensity normalization.

IN_samples

Optional argument passed to `interPlateNorm` function. A list of column names or indices specifying which subset of samples to use for intensity normalization step for each plate in data_list. By default, when this is set to `NULL`, all samples are used for IN.

replaceNA

Logical. Passed to `readNULISAseq()` function. If `TRUE` (default), will replace any missing counts with zero for generating NPQ. (For AQ data, see replace_zeros_with_NA, below.)

TAP

If `TRUE` (default), uses TAP detectability criteria in sample QC which includes more matrix types than non-TAP criteria. However, this function currently only flags samples based on IC median deviation.

output_TAP_AQ

Logical. Default is `FALSE`. If xml files include AQ parameters and this is set to TRUE, the function will output the Excel file format used for TAP projects. File will have 5 tabs including RQ data, AQ data, detectability, quantifiability, and sample information.

replace_cal_blank_zeros

Logical. Default is `FALSE`. If `TRUE`, AQ targets with a zero mean NC value will have the a_yint parameter replaced with the AQMC value, rather than be set to zero.

replace_zeros_with_NA

Logical. Default is `TRUE`. When `TRUE`, will replace and zero AQ data values with NA.

metadata

An optional named list of metadata, such as software package version numbers, that will be added as the last sheet in the Excel file. The list names will be the first column, and the values will form the second column. If `NULL` (default) then no metadata sheet is added.

verbose

Logical. Should function output step completion info. Default is `TRUE`.

Value

Outputs an Excel file.

Details

#' For RQ data, Excel file tabs include RQ data, target detectability and sample information. For AQ data, tabs include RQ data, AQ data, target detectability, target quantifiability, and sample information.

The function takes as input an optional target info file containing target metadata, including Alamar target IDs and protein names, and an optional sample info file containing sample metadata that is desired to appear in the output file.

Sample QC is currently based on the internal control within 40 median criterion. If a sample's IC count falls outside this threshold, it will be assigned a "WARN" status. Otherwise the sample will be assigned a "PASS" status.

Target QC (if AQ data is available) is currently based on the target concentration accuracy within 30 it will be assigned a "WARN" status. Otherwise the target will be assigned a "PASS" status.

The function outputs long format data where each row corresponds to a particular sample-target combination. Total number of rows is number of samples times number of targets. By default the sample control (SC/AQSC) wells are included in the output, and IPC and NC wells are omitted.