read_parquet_pages {nanoparquet} | R Documentation |
Metadata of all pages of a Parquet file
Description
Metadata of all pages of a Parquet file
Usage
read_parquet_pages(file)
Arguments
file |
Path to a Parquet file. |
Details
Reading all the page headers might be slow for large files, especially if the file has many small pages.
Value
Data frame with columns:
-
file_name
: file name. -
row_group
: id of the row group the page belongs to, an integer between 0 and the number of row groups minus one. -
column
: id of the column. An integer between the number of leaf columns minus one. Note that only leaf columns are considered, as non-leaf columns do not have any pages. -
page_type
:DATA_PAGE
,INDEX_PAGE
,DICTIONARY_PAGE
orDATA_PAGE_V2
. -
page_header_offset
: offset of the data page (its header) in the file. -
uncompressed_page_size
: does not include the page header, as per Parquet spec. -
compressed_page_size
: without the page header. -
crc
: integer, checksum, if present in the file, can beNA
. -
num_values
: number of data values in this page, includeNULL
(NA
in R) values. -
encoding
: encoding of the page, current possible encodings: "PLAIN", "GROUP_VAR_INT", "PLAIN_DICTIONARY", "RLE", "BIT_PACKED", "DELTA_BINARY_PACKED", "DELTA_LENGTH_BYTE_ARRAY", "DELTA_BYTE_ARRAY", "RLE_DICTIONARY", "BYTE_STREAM_SPLIT". -
definition_level_encoding
: encoding of the definition levels, seeencoding
for possible values. This can be missing in V2 data pages, where they are always RLE encoded. -
repetition_level_encoding
: encoding of the repetition levels, seeencoding
for possible values. This can be missing in V2 data pages, where they are always RLE encoded. -
data_offset
: offset of the actual data in the file. -
page_header_length
: size of the page header, in bytes.
See Also
read_parquet_page()
to read a page.
Examples
file_name <- system.file("extdata/userdata1.parquet", package = "nanoparquet")
nanoparquet:::read_parquet_pages(file_name)