load_repertoires {immundata} | R Documentation |
Load and Aggregate Immune Receptor Repertoire Data
Description
This function ingests a repertoire dataset (Parquet, CSV, or TSV), aggregates receptors
based on a user-defined schema, and splits the result into receptor-level and annotation-level
tables. The resulting data is saved to a designated output folder as two Parquet files
(receptors and annotations) and then reloaded to create an ImmunData
object.
Usage
load_repertoires(
path,
schema,
metadata = NULL,
barcode_col = NULL,
count_col = NULL,
repertoire_schema = NULL,
output_folder = NULL,
enforce_schema = TRUE,
verbose = TRUE
)
Arguments
path |
Path to an input file. This file may be Parquet, CSV, or TSV. The file extension is automatically detected and handled. |
schema |
Character vector defining which columns in the input data should be used to
identify unique receptor signatures. For example, |
metadata |
An optional data frame containing additional metadata to merge into the annotation table.
Default is |
barcode_col |
An optional character string specifying the column in the input data that represents
cell barcodes or other unique identifiers. Default is |
count_col |
An optional character string specifying the column in the input data that stores
bulk receptor counts. Default is |
repertoire_schema |
An optional character vector defining how annotations should be grouped into repertoires
(for example, |
output_folder |
Character string specifying the directory to save the resulting Parquet files. If |
enforce_schema |
Logical. If |
verbose |
. Logical. Not used – for now. |
Details
-
Reading – The function automatically detects whether
path
points to a Parquet, CSV, or TSV file, usingread_parquet_duckdb
orread_csv_duckdb
. -
Aggregation – Receptor uniqueness is determined by the columns named in
schema
, while barcodes or counts are handled depending on which parameters (barcode_col
,count_col
) are provided. -
Saving – The final receptor-level and annotation-level tables are written to Parquet files in
output_folder
. -
Reloading – The function calls
load_immundata()
on the newly created folder to return a fully instantiatedImmunData
.