Performs Principal Component Analysis (PCA) and generates a biplot for visualizing sample relationships based on gene expression data. Uses PCAtools for analysis and ggplot2 for visualization. Colors are automatically generated from RColorBrewer palettes.

generate_pca(
  data,
  sampleInfo,
  sampleName_var,
  annotate_sample_by = NULL,
  label_points = FALSE,
  sample_subset = NULL,
  target_subset = NULL,
  shape_by = NULL,
  encircle = TRUE,
  encircleFill = TRUE,
  sample_colors = NULL,
  output_dir = NULL,
  plot_name = NULL,
  plot_title = NULL,
  plot_width = 10,
  plot_height = 8,
  ...
)

Arguments

data

A matrix with targets in rows, samples in columns. Row names should be the target names, and column names are the sample names. It is assumed that data has already been transformed using log2(x + 1) for each NULISAseq normalized count value x, i.e. NPQ.

sampleInfo

A data frame with sample metadata. Rows are samples, columns are sample metadata variables.

sampleName_var

Character string specifying the name of the column in sampleInfo that matches the column names of data.

annotate_sample_by

Character string specifying the column name from sampleInfo to use for coloring points. Only one variable is allowed; defaults to NULL.

label_points

Logical indicating whether to add sample labels to the plot; defaults to FALSE.

sample_subset

Vector of sample names for selected samples to include in PCA, should match the existing column names of data; defaults to NULL (all samples).

target_subset

Vector of target names for selected targets to include in PCA, should match the existing row names of data; defaults to NULL (all targets).

shape_by

Character string specifying the column name from sampleInfo to use for point shapes; defaults to NULL.

encircle

Logical indicating whether to draw ellipses around groups; defaults to TRUE.

encircleFill

Logical indicating whether to fill the ellipses; defaults to TRUE.

sample_colors

Named vector of custom colors for sample groups. Names should match the levels in annotate_sample_by; defaults to NULL.

output_dir

Character string specifying the directory path to save the plot. If NULL, the plot is not saved; defaults to NULL. If provided without plot_name, a default filename with timestamp will be generated.

plot_name

Character string specifying the filename for the saved plot, including file extension (.pdf, .png, .jpg, or .svg). If NULL and output_dir is provided, a default filename with timestamp will be used; defaults to NULL.

plot_title

Character string for the title of the PCA plot; defaults to NULL.

plot_width

Numeric value for the width of the saved plot in inches; defaults to 10.

plot_height

Numeric value for the height of the saved plot in inches; defaults to 8.

...

Additional arguments passed to PCAtools::biplot function.

Value

A list containing:

targets_used

Character vector of target names used in the PCA after filtering.

pca_results

The PCAtools PCA object containing all PCA results.

rotated

Data frame containing the PC scores (PC1, PC2, etc.) for each sample.

plot

The ggplot2 object of the PCA biplot.

output_path

Character string of the full path to the saved file, or NULL if not saved.

Details

The function performs the following steps:

  1. Filters data to specified samples and targets

  2. Removes targets with all zero values

  3. Scales data by row (Z-score transformation)

  4. Removes rows with NA, NaN, or Inf values after scaling

  5. Performs PCA using PCAtools

  6. Generates biplot with specified aesthetics

  7. Optionally saves to file

Custom Colors

To specify custom colors for sample groups, use the sample_colors parameter:


my_colors <- c("Control" = "#FF0000", "Treatment" = "#0000FF")

Examples

if (FALSE) { # \dontrun{
# Basic PCA plot
result <- generate_pca(
  data = Data_NPQ,
  sampleInfo = sample_metadata,
  sampleName_var = "SampleName",
  annotate_sample_by = "Group"
)

# PCA with sample labels and custom shapes
result <- generate_pca(
  data = Data_NPQ,
  sampleInfo = sample_metadata,
  sampleName_var = "SampleName",
  annotate_sample_by = "Group",
  shape_by = "Batch",
  label_points = TRUE
)

# PCA with custom colors
custom_colors <- c("Control" = "blue", "Treatment" = "red")
result <- generate_pca(
  data = Data_NPQ,
  sampleInfo = sample_metadata,
  sampleName_var = "SampleName",
  annotate_sample_by = "Group",
  sample_colors = custom_colors
)

# Save PCA plot to file
result <- generate_pca(
  data = Data_NPQ,
  sampleInfo = sample_metadata,
  sampleName_var = "SampleName",
  annotate_sample_by = "Group",
  output_dir = "output/figures",
  plot_name = "pca_analysis.pdf",
  plot_title = "PCA Analysis of Expression Data",
  plot_width = 12,
  plot_height = 10
)
} # }