extractText {orderanalyzer}R Documentation

Extracts the text from a PDF file

Description

This function extracts text from PDF documents and returns the text as a string, as a list of lines and as a list of words. It uses 'pdftools' to extract the content from textual PDF files and 'tesseract' to extract the content from image-based PDF-files.

Usage

extractText(file)

Arguments

file

Path to the PDF file

Value

List including the extracted text, a data table including the lines, a data table including the words, the type and language of the document.

Examples

file <- system.file("extdata", "OrderDocument_en.pdf", package = "orderanalyzer")
text <- extractText(file)
text$words


[Package orderanalyzer version 1.0.0 Index]