extract_entities {LBDiscover} | R Documentation |
Extract and classify entities from text with multi-domain types
Description
This function extracts entities from text and optionally assigns them to specific semantic categories based on dictionaries.
Usage
extract_entities(
text_data,
text_column = "abstract",
dictionary = NULL,
case_sensitive = FALSE,
overlap_strategy = c("priority", "all", "longest"),
sanitize_dict = TRUE
)
Arguments
text_data |
A data frame containing article text data. |
text_column |
Name of the column containing text to process. |
dictionary |
Combined dictionary or list of dictionaries for entity extraction. |
case_sensitive |
Logical. If TRUE, matching is case-sensitive. |
overlap_strategy |
How to handle terms that match multiple dictionaries: "priority", "all", or "longest". |
sanitize_dict |
Logical. If TRUE, sanitizes the dictionary before extraction. |
Value
A data frame with extracted entities, their types, and positions.
[Package LBDiscover version 0.1.0 Index]