get_nexis_html {readtext} | R Documentation |
extract texts and meta data from Nexis HTML files
Description
This extract headings, body texts and meta data (date, byline, length, section, edition) from items in HTML files downloaded by the scraper.
Usage
get_nexis_html(path, paragraph_separator = "\n\n", verbosity, ...)
Arguments
path |
either path to a HTML file or a directory that contains HTML files |
paragraph_separator |
a character to separate paragraphs in body texts |
verbosity |
|
... |
only to trap extra arguments |
Examples
## Not run:
irt <- readtext:::get_nexis_html('tests/data/nexis/irish-times_1995-06-12_0001.html')
afp <- readtext:::get_nexis_html('tests/data/nexis/afp_2013-03-12_0501.html')
gur <- readtext:::get_nexis_html('tests/data/nexis/guardian_1986-01-01_0001.html')
sun <- readtext:::get_nexis_html('tests/data/nexis/sun_2000-11-01_0001.html')
spg <- readtext:::get_nexis_html('tests/data/nexis/spiegel_2012-02-01_0001.html',
language_date = 'german')
all <- readtext('tests/data/nexis', source = 'nexis')
all <- readtext('tests/data/nexis', source = 'nexis')
## End(Not run)
[Package readtext version 0.91 Index]