seek {seekr} | R Documentation |
Extract Matching Lines from Files
Description
These functions search through one or more text files, extract lines matching a regular expression pattern, and return a tibble containing the results.
-
seek()
: Discovers files inside one or more directories (recursively or not), applies optional file name and text file filtering, and searches lines. -
seek_in()
: Searches inside a user-provided character vector of files.
Usage
seek(
pattern,
path = ".",
...,
filter = NULL,
negate = FALSE,
recurse = FALSE,
all = FALSE,
relative_path = TRUE,
matches = FALSE
)
seek_in(files, pattern, ..., matches = FALSE)
Arguments
pattern |
A regular expression pattern used to match lines. |
path |
A character vector of one or more directories where files should be
discovered (only for |
... |
Additional arguments passed to |
filter |
Optional. A regular expression pattern used to filter file paths
before reading. If |
negate |
Logical. If |
recurse |
If |
all |
If |
relative_path |
Logical. If TRUE, file paths are made relative to the path argument. If multiple root paths are provided, relative_path is automatically ignored and absolute paths are kept to avoid ambiguity. |
matches |
Logical. If |
files |
A character vector of files to search (only for |
Details
The overall process involves the following steps:
-
File Selection
-
seek()
: Files are discovered usingfs::dir_ls()
, starting from one or more directories. -
seek_in()
: Files are directly supplied by the user (no discovery phase).
-
-
File Filtering
Files located inside
.git/
folders are automatically excluded.Files with known non-text extensions (e.g.,
.png
,.exe
,.rds
) are excluded.If a file's extension is unknown, a check is performed to detect embedded null bytes (binary indicator).
Optionally, an additional regex-based path filter (
filter
) can be applied.
-
Line Reading
Files are read line-by-line using
readr::read_lines()
.Only lines matching the provided regular expression
pattern
are retained.If a file cannot be read, it is skipped gracefully without failing the process.
-
Data Frame Construction
A tibble is constructed with one row per matched line.
These functions are particularly useful for analyzing source code, configuration files, logs, and other structured text data.
Value
A tibble with one row per matched line, containing:
-
path
: File path (relative or absolute). -
line_number
: Line number in the file. -
match
: The first matched substring. -
matches
: All matched substrings (ifmatches = TRUE
). -
line
: Full content of the matching line.
See Also
fs::dir_ls()
, readr::read_lines()
, stringr::str_detect()
Examples
path = system.file("extdata", package = "seekr")
# Search all function definitions in R files
seek("[^\\s]+(?= (=|<-) function\\()", path, filter = "\\.R$")
# Search for usage of "TODO" comments in source code in a case insensitive way
seek("(?i)TODO", path, filter = "\\.R$")
# Search for error/warning in log files
seek("(?i)error", path, filter = "\\.log$")
# Search for config keys in YAML
seek("database:", path, filter = "\\.ya?ml$")
# Looking for "length" in all types of text files
seek("(?i)length", path)
# Search for specific CSV headers using seek_in() and reading only the first line
csv_files <- list.files(path, "\\.csv$", full.names = TRUE)
seek_in(csv_files, "(?i)specie", n_max = 1)