join_it {RcensusPkg}R Documentation

join_it

Description

Outer join two dataframes that have a common column variable.

Function uses fast data.table techniques to join two data.tables by their common key values. Examples might include using the "GEOID" variable as a key to join data from RcensusPkg::get_vintage_data() with a simple feature with its geometries for counties, states, countries for example from RcensusPkg::tiger_*_sf(). The resulting dataframe could then display the geometries (with RplotterPkg::create_sf_plot()) with an aesthetic mapping (e.g. fill/color/size) with a joined data column. Joining could also take place between two simple features (created by RcensusPkg::tiger_*_sf()) or between two dataframes (created by RcensusPkg::get_vintage_data()).

The important thing to remember is that all the rows in 'df_2' will be present in the resultant data.table.

Usage

join_it(
  df_1 = NULL,
  df_2 = NULL,
  key_1 = NULL,
  key_2 = NULL,
  negate = FALSE,
  match = FALSE,
  return_sf = FALSE,
  na_rm = FALSE
)

Arguments

df_1

The first dataframe to be joined.

df_2

The second dataframe to be joined with 'df_1'. All rows in this dataframe will be present in the resultant dataframe.

key_1

A string that names the column from 'df_1' that is common to 'df_2'.

key_2

A string that names the column from 'df_2' that is common to 'df_1'.

negate

An optional logical which if TRUE will return a dataframe that has rows in 'df_1' but not in 'df_2'.

match

An optional logical which if TRUE will return a dataframe that has rows where only both 'df_1' and 'df_2' have matches.

return_sf

An optional logical which if TRUE will convert the resultant data.table to a simple feature if it has a geometries column.

na_rm

An optional logical which if TRUE then remove rows with NA values. The default is FALSE.

Value

A data.table or simple feature object if 'return_sf' is TRUE.

Examples

## Not run: 
  # Requires Census Bureau API key
  # Get the median household income by tract for Washington DC and join
  # this data with DC tract boundaries.

  library(data.table)
  library(httr2)
  library(jsonlite)
  library(sf)
  library(usmap)
  library(withr)
  library(ggplot2)
  library(RcensusPkg)

  # Get the 2020 median household income data by tract for DC
  dc_fips <- usmap::fips(state = "dc")
  dc_B19013_dt <- RcensusPkg::get_vintage_data(
    dataset = "acs/acs5",
    vintage = 2020,
    vars = "B19013_001E",
    region = "tract",
    regionin = paste0("state:", dc_fips)
  )
  # Get the simple feature DC tract geometries and join the data dataframe "dc_B19013_dt"
  output_dir <- withr::local_tempdir()
  if(!dir.exists(output_dir)){
    dir.create(output_dir)
  }
  dc_tracts_sf <- RcensusPkg::tiger_tracts_sf(
    state = dc_fips,
    output_dir = output_dir,
    general = TRUE,
    delete_files = FALSE
  )
  # Join the data with simple feature object
  dc_joined_sf <- RcensusPkg::join_it(
    df_1 = dc_B19013_dt,
    df_2 = dc_tracts_sf,
    key_1 = "GEOID",
    key_2 = "GEOID",
    return_sf = TRUE
  )

## End(Not run)


[Package RcensusPkg version 0.1.5 Index]