parquet_schema {nanoparquet}R Documentation

Create a Parquet schema

Description

You can use this schema to specify how to write out a data frame to a Parquet file with write_parquet().

Usage

parquet_schema(...)

Arguments

...

Parquet type specifications, see below. For backwards compatibility, you can supply a file name here, and then parquet_schema behaves as read_parquet_schema().

Details

A schema is a list of potentially named type specifications. A schema is stored in a data frame. Each (potentially named) argument of parquet_schema may be a character scalar, or a list. Parameterized types need to be specified as a list. Primitive Parquet types may be specified as a string or a list.

Value

Data frame with the same columns as read_parquet_schema(): file_name, name, r_type, type, type_length, repetition_type, converted_type, logical_type, num_children, scale, precision, field_id.

Possible types:

Special type:

Primitive Parquet types:

Parquet logical types:

Logical types MAP, LIST and UNKNOWN are not supported currently.

Converted types are deprecated in the Parquet specification in favor of logical types, but parquet_schema() accepts some converted types as a syntactic shortcut for the corresponding logical types:

Missing values

Each type might also have a repetition_type parameter, with possible values "REQUIRED", "OPTIONAL" or "REPEATED". "REQUIRED" columns do not allow missing values. Missing values are allowed in "OPTIONAL" columns. "REPEATED" columns are currently not supported in write_parquet().

Examples

parquet_schema(
  c1 = "INT32",
  c2 = list("INT", bit_width = 64, is_signed = TRUE),
  c3 = list("STRING", repetition_type = "OPTIONAL")
)

[Package nanoparquet version 0.4.2 Index]