check_response_curve_data {PhotoGEA}R Documentation

Check response curve data for common issues

Description

Checks to make sure an exdf object representing multiple response curves meets basic expectations.

Usage

  check_response_curve_data(
    exdf_obj,
    identifier_columns,
    expected_npts = 0,
    driving_column = NULL,
    driving_column_tolerance = 1.0,
    col_to_ignore_for_inf = 'gmc',
    constant_col = list(),
    error_on_failure = TRUE,
    print_information = TRUE
  )

Arguments

exdf_obj

An exdf object representing multiple response curves.

identifier_columns

A vector or list of strings representing the names of columns in exdf_obj that, taken together, uniquely identify each curve. This often includes names like plot, event, replicate, etc.

expected_npts

A numeric vector of length 1 or 2 specifying conditions for the number of points in each curve. If expected_npts is set to a negative number, then this check will be skipped. See below for more details.

driving_column

The name of a column that is systematically varied to produce each curve; for example, in a light response curve, this would typically by Qin. If driving_column is NULL, then this check will be skipped.

driving_column_tolerance

An absolute tolerance for the deviation of each value of driving_column away from its mean across all the curves; the driving_column_tolerance can be set to Inf to disable this check.

col_to_ignore_for_inf

Any columns to ignore while checking for infinite values. Mesophyll conductance (gmc) is often set to infinity intentionally so should be ignored when performing this check. To completely disable this check, set col_to_ignore_for_inf to NULL.

constant_col

A list of named numeric elements, where the name indicates a column of exdf_obj that should be constant, and the value indicates whether the column's values must be identical or whether they must lie within a specified numeric range. If constant_col is an empty list, then this check will be skipped. See below for more details.

error_on_failure

A logical value indicating whether to send an error message when an issue is detected. See details below.

print_information

A logical value indicating whether to print additional information to the R terminal when an issue is detected. See details below.

Details

Basic Behavior:

This function makes a few basic checks to ensure that the response curve data includes the expected information and does not include any mistakes. If no problems are detected, this function will be silent with no return value. If a problem is detected, then the user will be notified in one or more ways:

This function will (optionally) perform several checks:

By default, most of these are not performed (except the simplest ones like checking for infinite values or checking that key columns are present). This enables an "opt-in" use style, where users can specify just the checks they wish to make.

More Details:

There are several options for checking the number of points in each curve:

There are two options for checking columns that should be constant:

For example, setting constant_col = list(species = NA, Qin = 10) means that each curve must have only a single value of the species column, and that the value of the Qin column cannot vary by more than 10 across each curve.

Use Cases:

Using check_response_curve_data is not strictly necessary, but it can be helpful both to you and to anyone else reading your analysis code. Here are a few typical use cases:

Sometimes the response curves in a large data set were not all measured with the same sequence of setpoints. If only a few different sequences were used, it is possible to split them into groups and separately run check_response_curve_data on each group. This scenario is discussed in the Frequently Asked Questions vignette.

Even if none of the above situations are relevant to you, it may still be helpful to run run check_response_curve_data but with expected_npts set to 0 and error_on_failure set to FALSE. With these settings, if there are curves with different numbers of points, the function will print the number of points in each curve to the R terminal, but won't stop the rest of the script from running. This can be useful for detecting problems with the curve_identifier column. For example, if the longest curves in the set are known to have 17 points, but check_response_curve_data identifies a curve with 34 points, it is clear that the same identifier was accidentally used for two different curves.

Value

The check_response_curve_data function does not return anything.

Examples

# Read an example Licor file included in the PhotoGEA package and check it.
# This file includes several 7-point light-response curves that can be uniquely
# identified by the values of its 'species' and 'plot' columns. Since these are
# light-response curves, each one follows a pre-set sequence of `Qin` values.
licor_file <- read_gasex_file(
  PhotoGEA_example_file_path('ball_berry_1.xlsx')
)

# Make sure there are no infinite values and that all curves have the same
# number of points
check_response_curve_data(licor_file, c('species', 'plot'))

# Make sure there are no inifinite values and that all curves have 7 points
check_response_curve_data(licor_file, c('species', 'plot'), expected_npts = 7)

# Make sure there are no infinite values, that all curves have 7 points, and
# that the values of the `Qin` column follow the same sequence in all curves
# (to within 1.0 micromol / m^2 / s)
check_response_curve_data(
  licor_file,
  c('species', 'plot'),
  expected_npts = 7,
  driving_column = 'Qin',
  driving_column_tolerance = 1.0
)

# Make sure that there are no infinite values and that all curves have between
# 8 and 10 points; this will intentionally fail
check_response_curve_data(
  licor_file,
  c('species', 'plot'),
  expected_npts = c(8, 10),
  error_on_failure = FALSE
)

# Split the data set by `species` and make sure all curves have similar numbers
# of points (within 3 of the mean value); this will intentionally fail
check_response_curve_data(
  licor_file,
  'species',
  expected_npts = c(0, 3),
  error_on_failure = FALSE
)

# Split the data set by `species` and make sure all curves have a constant value
# of `plot` and a limited range of `TLeafCnd`; this will intentionally fail
check_response_curve_data(
  licor_file,
  'species',
  constant_col = list(plot = NA, TleafCnd = 0.001),
  error_on_failure = FALSE
)

[Package PhotoGEA version 1.3.3 Index]