df_from_file {duckplyr}R Documentation

Read Parquet, CSV, and other files using DuckDB

Description

This function ingests data from files. Internally, a DuckDB table-valued function is called, the results are transparently converted to a data frame. The data is only read when the data is actually accessed. See https://duckdb.org/docs/data/overview for a documentation of the available functions and their options.

duckplyr_df_from_file() is a thin wrapper around df_from_file() that calls as_duckplyr_df() on the output.

Usage

df_from_file(path, table_function, options = list(), class = NULL)

duckplyr_df_from_file(path, table_function, options = list(), class = NULL)

Arguments

path

Path to file or directory

table_function

The name of a table-valued DuckDB function such as "read_parquet", "read_csv", "read_csv_auto" or "read_json".

options

Arguments to the DuckDB function indicated by table_function.

class

An optional class to add to the data frame. The returned object will always be a data frame. Pass class(tibble()) to create a tibble.

Value

A data frame for df_from_file(), or a duckplyr_df for duckplyr_df_from_file(), extended by the provided class.

Examples

# Create simple CSV file
path <- tempfile(fileext = ".csv")
write.csv(data.frame(a = 1:3, b = letters[4:6]), path, row.names = FALSE)

# Reading is immediate
df <- df_from_file(path, "read_csv_auto")

# Materialization only upon access
names(df)
df$a

# Return as tibble:
df_from_file(
  path,
  "read_csv",
  options = list(delim = ",", auto_detect = TRUE),
  class = class(tibble())
)

unlink(path)

[Package duckplyr version 0.2.3 Index]