ipynbcount {rmdwc} | R Documentation |
Count text elements in Jupyter Notebook files
Description
This function extracts text from specific cell types (e.g., markdown) in one or more .ipynb
files
and counts the number of characters, words, and lines. It optionally excludes certain patterns (e.g., code fences).
The function uses a helper function rmdcount()
to perform the counting on the extracted text.
Usage
ipynbcount(
files,
celltype = c("markdown"),
space = "[[:space:]]",
word = "[[:space:]]+",
line = "\n",
exclude = "```\\{.*?```"
)
Arguments
files |
character: vector of paths to |
celltype |
character: vector indicating which cell types to include (default is |
space |
character: pattern to split a text at spaces (default: |
word |
character: pattern to split a text at word boundaries (default: |
line |
character: pattern to split lines (default: |
exclude |
character: pattern to exclude text parts, e.g. code chunks (default: |
Details
This function assumes that the notebook files are valid JSON and contain a list of cells under the cells
field.
It temporarily writes the extracted content to a file to reuse the rmdcount()
logic.
Value
A data frame with counts of characters, words, and lines for each file. Additional columns include file
(base name) and path
(directory).
Examples
file <- system.file('ipynb/example_data_analysis.ipynb', package="rmdwc")
ipynbcount(file) # without code
ipynbcount(file, celltype=c("markdown", "code")) # with code