class PandocBeautifier

This class provides the major functionalites

Note that it is called PandocBeautifier for historical reasons

provides methods to Process a pandoc file

Attributes

config[RW]
log[RW]

Public Class Methods

new(logger = nil) click to toggle source

the constructor @param [Logger] logger logger object to be applied.

if none is specified, a default logger
will be implemented
# File lib/wortsammler/class.proolib.rb, line 453
def initialize(logger = nil)

  @markdown_output_switches = %w{
   +backtick_code_blocks
   -fenced_code_blocks
   +compact_definition_lists
   +space_in_atx_header
   +yaml_metadata_block
  }.join()

  @markdown_input_switches = %w{
   +smart
   +backtick_code_blocks
   +fenced_code_blocks
   +compact_definition_lists
   -space_in_atx_header
  }.join()


  @view_pattern = /~~ED((\s*(\w+))*)~~/
  # @view_pattern = /<\?ED((\s*(\w+))*)\?>/
  @tempdir      = Dir.mktmpdir

  @config = ProoConfig.new()

  @log=logger || $logger || nil

  if @log == nil
    @log                 = Logger.new(STDOUT)
    @log.level           = Logger::INFO
    @log.datetime_format = "%Y-%m-%d %H:%M:%S"
    @log.formatter       = proc do |severity, datetime, progname, msg|
      "#{datetime}: #{msg}\n"
    end

  end
end

Public Instance Methods

beautify(file) click to toggle source

perform the beautify

  • process the file with pandoc

  • revoke some quotes introduced by pandoc

@param [String] file the name of the file to be beautified

# File lib/wortsammler/class.proolib.rb, line 521
def beautify(file)

  @log.debug(" Cleaning: \"#{file}\"")

  docfile = File.new(file)
  olddoc  = docfile.readlines.join
  docfile.close

  # process the file in pandoc
  cmd                     = "#{PANDOC_EXE} --standalone #{file.esc} -f markdown#{@markdown_input_switches} -t markdown#{@markdown_output_switches} --atx-headers  --id-prefix=#{File.basename(file).esc}_ "

  newdoc                  = `#{cmd}`
  @log.debug "beautify #{file.esc}: #{$?}"
  @log.debug(" finished: \"#{file}\"")

  # tweak the quoting
  if $?.success? then
    # (RS_Mdc)
    # TODO: fix Table width toggles sometimes
    if (not olddoc == newdoc) then ##only touch the file if it is really changed
      File.open(file, "w") { |f| f.puts(newdoc) }
      File.open(file+".bak", "w") { |f| f.puts(olddoc) } # (RS_Mdc_) # remove this if needed
      @log.debug("  cleaned: \"#{file}\"")
    else
      @log.debug("was clean: \"#{file}\"")
    end
    #TODO: error handling here
  else
    @log.error("error calling pandoc - please watch the screen output")
  end
end
check_pandoc_version() click to toggle source

@return [boolean] true if an appropriate version is available

# File lib/wortsammler/class.proolib.rb, line 500
def check_pandoc_version
  required_version_string="2.0.5"
  begin
    pandoc_version=`#{PANDOC_EXE} -v`.split("\n").first.split(" ")[1]
    if pandoc_version < required_version_string then
      @log.error "found pandoc #{pandoc_version} need #{required_version_string}"
      result = false
    else
      result = true
    end
  rescue Exception => e
    @log.error("could not run pandoc: #{e.message}")
    result=false
  end
  result
end
collect_document(input, output) click to toggle source

This compiles the input documents to one single file it also beautifies the input files

@param [Array of String] input - the input files to be processed in the given sequence @param [String] output - the the name of the output file

# File lib/wortsammler/class.proolib.rb, line 679
def collect_document(input, output)
  inputs   =input.map { |xx| xx.esc.to_osPath }.join(" ") # qoute cond combine the inputs
  inputname=File.basename(input.first)

  #now combine the input files
  @log.debug("combining the input files #{inputname} et al")
  cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} --standalone -t markdown#{@markdown_output_switches} -o #{output} --ascii #{inputs}" # note that inputs is already quoted
  system(cmd)
  if $?.success? then
    PandocBeautifier.new().beautify(output)
  end
end
filter_document_variant(inputfile, outputfile, view) click to toggle source

This filters the document according to the target audience

@param [String] inputfile name of inputfile @param [String] outputfile name of outputfile @param [String] view - name of intended view

# File lib/wortsammler/class.proolib.rb, line 622
def filter_document_variant(inputfile, outputfile, view)

  input_data = File.open(inputfile) { |f| f.readlines }

  output_data = Array.new
  is_active   = true
  input_data.each { |l|
    switch=self.get_filter_command(l, view)
    l.gsub!(@view_pattern, "")
    is_active = switch unless switch.nil?
    @log.debug "select edtiion #{view}: #{is_active}: #{l.strip}"

    output_data << l if is_active
  }

  File.open(outputfile, "w") { |f| f.puts output_data.join }
end
generateDocument(input, outdir, outname, format, vars, editions=nil, snippetfiles=nil, frontmatter=nil, config=nil) click to toggle source

This generates the final document

It actually does this in two steps:

  1. process front matter to laTeX

  2. process documents

@param [Array of String] input the input files to be processed in the given sequence @param [String] outdir the output directory @param [String] outname the base name of the output file. It is a basename in case the

output format requires multiple files

@param [Array of String] format list of formats which shall be generated.

supported formats: "pdf", "latex", "html", "docx", "rtf", txt

@param [Hash] vars - the variables passed to pandoc @param [Hash] editions - the editions to process; default nil - no edition processing @param [Array of String] snippetfiles the list of files containing snippets @param [String] frontmatter file path to frontmatter the file to processed as frontmatter @param [ProoConfig] config - the configuration file to be used

# File lib/wortsammler/class.proolib.rb, line 744
def generateDocument(input, outdir, outname, format, vars, editions=nil, snippetfiles=nil, frontmatter=nil, config=nil)

  # combine the input files

  temp_filename    = "#{@tempdir}/x.md".to_osPath
  temp_frontmatter = "#{@tempdir}/xfrontmatter.md".to_osPath unless frontmatter.nil?
  collect_document(input, temp_filename)
  collect_document(frontmatter, temp_frontmatter) unless frontmatter.nil?

  # process the snippets

  if not snippetfiles.nil?
    snippets={}
    snippetfiles.each { |f|
      if File.exists?(f)
        type=File.extname(f)
        case type
          when ".yaml"
            x=YAML.load(File.new(f))
          when ".xlsx"
            x=load_snippets_from_xlsx(f)
          else
            @log.error("Unsupported File format for snipptets: #{type}")
            x={}
        end
        snippets.merge!(x)
      else
        @log.error("Snippet file not found: #{f}")
      end
    }

    replace_snippets_in_file(temp_filename, snippets)
  end

  vars_frontmatter          =vars.clone
  vars_frontmatter[:usetoc] = "nousetoc"


  if editions.nil?
    # there are no editions
    unless frontmatter.nil? then
      render_document(temp_frontmatter, tempdir, temp_frontmatter, ["frontmatter"], vars_frontmatter)
      vars[:frontmatter] = "#{tempdir}/#{temp_frontmatter}.latex"
    end
    render_document(temp_filename, outdir, outname, format, vars, config)
  else
    # process the editions
    editions.each { |edition_name, properties|
      edition_out_filename     = "#{outname}_#{properties[:filepart]}"
      edition_temp_frontmatter = "#{@tempdir}/#{edition_out_filename}_frontmatter.md" unless frontmatter.nil?
      edition_temp_filename    = "#{@tempdir}/#{edition_out_filename}.md"
      vars[:title]             = properties[:title]

      editionformats = properties[:format] || format

      if properties[:debug]
        process_debug_info(temp_frontmatter, edition_temp_frontmatter, edition_name.to_s) unless frontmatter.nil?
        process_debug_info(temp_filename, edition_temp_filename, edition_name.to_s)
        lvars               =vars.clone
        lvars[:linenumbers] = "true"
        unless frontmatter.nil? # frontmatter
          lvars[:usetoc] = "nousetoc"
          render_document(edition_temp_frontmatter, @tempdir, "xfrontmatter", ["frontmatter"], lvars)
          lvars[:usetoc]      = vars[:usetoc] || "usetoc"
          lvars[:frontmatter] = "#{@tempdir}/xfrontmatter.latex"
        end
        render_document(edition_temp_filename, outdir, edition_out_filename, ["pdf", "latex"], lvars, config)
      else
        unless frontmatter.nil? # frontmatter
          filter_document_variant(temp_frontmatter, edition_temp_frontmatter, edition_name.to_s)
          render_document(edition_temp_frontmatter, @tempdir, "xfrontmatter", ["frontmatter"], vars_frontmatter)
          vars[:frontmatter]="#{@tempdir}/xfrontmatter.latex"
        end

        filter_document_variant(temp_filename, edition_temp_filename, edition_name.to_s)
        render_document(edition_temp_filename, outdir, edition_out_filename, editionformats, vars, config)
      end
    }
  end
end
get_filter_command(line, view) click to toggle source

Ths determines the view filter

@param [String] line - the current input line @param [String] view - the currently selected view

@return true/false if a view-command is found, else nil

# File lib/wortsammler/class.proolib.rb, line 602
def get_filter_command(line, view)
  r = line.match(@view_pattern)

  if not r.nil?
    found  = r[1].split(" ")
    result = (found & [view, "all"].flatten).any?
  else
    result = nil
  end

  result
end
load_snippets_from_xlsx(file) click to toggle source

This loads snipptes from xlsx file @param [String] file name of the xlsx file @return [Hash] a hash with the snippetes

# File lib/wortsammler/class.proolib.rb, line 697
def load_snippets_from_xlsx(file)
  temp_filename = "#{@tempdir}/snippett.xlsx"
  FileUtils::copy(file, temp_filename)
  wb    =RubyXL::Parser.parse(temp_filename)
  result={}
  wb.first.each { |row|
    key, the_value = row
    unless key.nil?
      unless the_value.nil?
        result[key.value.to_sym] = resolve_xml_entities(the_value.value) rescue ""
      end
    end
  }
  result
end
process_debug_info(inputfile, outputfile, view) click to toggle source

This filters the document according to the target audience

@param [String] inputfile name of inputfile @param [String] outputfile name of outputfile @param [String] view - name of intended view

# File lib/wortsammler/class.proolib.rb, line 647
def process_debug_info(inputfile, outputfile, view)

  input_data = File.open(inputfile) { |f| f.readlines }

  output_data = Array.new

  input_data.each { |l|
    l.gsub!(@view_pattern) { |p|
      if $1.strip == "all" then
        color="black"
      else
        color="red"
      end

      "\\color{#{color}}\\rule{2cm}{0.5mm}\\newline\\marginpar{#{$1.strip}}"

    }

    l.gsub!(/todo:|TODO:/) { |p| "#{p}\\marginpar{TODO}" }

    output_data << l
  }

  File.open(outputfile, "w") { |f| f.puts output_data.join }
end
render_document(input, outdir, outname, format, vars, config=nil) click to toggle source

@param config [ProoConfig] the entire config object (for future extensions) @return nil

# File lib/wortsammler/class.proolib.rb, line 850
def render_document(input, outdir, outname, format, vars, config=nil)

  #TODO: Clarify the following
  # on Windows, Tempdir contains a drive letter. But drive letter
  # seems not to work in pandoc -> pdf if the path separator ist forward
  # slash. There are two options to overcome this
  #
  # 1. set tempdir such that it does not contain a drive letter
  # 2. use Dir.mktempdir but ensure that all provided file names
  #    use the platform specific SEPARATOR
  #
  # for whatever Reason, I decided for 2.

  tempfile      = input
  tempfilePdf   = "#{@tempdir}/x.TeX.md".to_osPath
  tempfileHtml  = "#{@tempdir}/x.html.md".to_osPath
  outfile       = "#{outdir}/#{outname}".to_osPath
  outfilePdf    = "#{outfile}.pdf"
  outfileDocx   = "#{outfile}.docx"
  outfileHtml   = "#{outfile}.html"
  outfileRtf    = "#{outfile}.rtf"
  outfileLatex  = "#{outfile}.latex"
  outfileText   = "#{outfile}.txt"
  outfileSlide  = "#{outfile}.slide.html"


  ## format handle

  # todo: use this information ...

  format_config = {
      'pdf'      => {
          tempfile: :pdf,
          outfile:  "#{outfile}.pdf"
      },
      'html'     => {
          tempfile: :html,
          outfile:  "#{outfile}.html"
      },
      'docx'     => {
          tempfile: :html,
          outfile:  "#{outfile}.docx"
      },
      'rtf'      => {
          tempfile: :html,
          outfile:  "#{outfile}.rtf"
      },
      'latex'    => {
          tempfile: :pdf,
          outfile:  "#{outfile}.latex"
      },
      'text'     => {
          tempfile: :html,
          outfile:  "#{outfile}.text"
      },
      'dzslides' => {
          tempfile: :html,
          outfile:  "#{outfile}.slide.html"
      },

      :beamer    => {
          tempfile: :pdf,
          outfile:  "#{outfile}.beamer.pdf"
      },

      'markdown' => {
          tempfile: :html,
          outfile:  "#{outfile}.slide.html"
      }
  }

  tempfile_config = {
      pdf:  "#{@tempdir}/x.TeX.md".to_osPath,
      html: "#{@tempdir}/x.html.md".to_osPath
  }


  if vars.has_key? :frontmatter
    latexTitleInclude = "--include-before-body=#{vars[:frontmatter].esc}"
  else
    latexTitleInclude
  end

  #todo: make config required, so it can be reduced to the else part
  if config.nil? then
    latexStyleFile = File.dirname(File.expand_path(__FILE__))+"/../../resources/default.wortsammler.latex"
    latexStyleFile = File.expand_path(latexStyleFile).to_osPath
    css_style_file = File.dirname(File.expand_path(__FILE__))+"/../../resources/default.wortsammler.css"
    css_style_file = File.expand_path(css_style_file).to_osPath
  else
    latexStyleFile = config.stylefiles[:latex]
    css_style_file = config.stylefiles[:css]
  end


  toc = "--toc"
  toc = "" if vars[:usetoc]=="nousetoc"

  if vars[:documentclass]=="book"
    option_chapters = "--chapters"
  else
    option_chapter = ""
  end

  begin
    vars_string=vars.map.map { |key, value| "-V #{key}=#{value.esc}" }.join(" ")
  rescue
   #todo require 'pry'; binding.pry
  end

  @log.info("rendering  #{outname} as [#{format.join(', ')}]")

  supported_formats=["pdf", "latex", "frontmatter", "docx", "html", "txt", "rtf", "slidy", "md", "beamer"]
  wrong_format     =format - supported_formats
  wrong_format.each { |f| @log.error("format not supported: #{f}") }

  begin

    if format.include?("frontmatter") then

      ReferenceTweaker.new("pdf").prepareFile(tempfile, tempfilePdf)

      cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfilePdf.esc}  --pdf-engine xelatex  #{vars_string} --ascii -t latex+smart -o  #{outfileLatex.esc}"
      `#{cmd}`
    end


    if (format.include?("pdf") | format.include?("latex")) then
      @log.debug("creating  #{outfileLatex}")
      ReferenceTweaker.new("pdf").prepareFile(tempfile, tempfilePdf)

      cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfilePdf.esc} #{toc} --standalone #{option_chapters} --pdf-engine xelatex --number-sections #{vars_string}" +
          " --template #{latexStyleFile.esc} --ascii -t latex+smart -o  #{outfileLatex.esc} #{latexTitleInclude}"
      `#{cmd}`

    end



    if format.include?("pdf") then
      @log.debug("creating  #{outfilePdf}")
      ReferenceTweaker.new("pdf").prepareFile(tempfile, tempfilePdf)
      #cmd="#{PANDOC_EXE} -S #{tempfilePdf.esc} #{toc} --standalone #{option_chapters} --latex-engine xelatex --number-sections #{vars_string}" +
      #  " --template #{latexStyleFile.esc} --ascii -o  #{outfilePdf.esc} #{latexTitleInclude}"
      cmd  ="#{LATEX_EXE} -halt-on-error -interaction nonstopmode -output-directory=#{outdir.esc} #{outfileLatex.esc}"
      #cmdmkindex = "makeindex \"#{outfile.esc}.idx\""

      latex=LatexHelper.new.set_latex_command(cmd).setlogger(@log)
      latex.run(outfileLatex)

      messages=latex.log_analyze("#{outdir}/#{outname}.log")

      removeables = ["toc", "aux", "bak", "idx", "ilg", "ind"]
      removeables << "log" unless messages > 0


      removeables << "latex" unless format.include?("latex")
      removeables = removeables.map { |e| "#{outdir}/#{outname}.#{e}" }.select { |f| File.exists?(f) }
      removeables.each { |e|
        @log.debug "removing file: #{e}"
        FileUtils.rm e
      }
    end

    if format.include?("html") then
      #todo: handle css
      @log.debug("creating  #{outfileHtml}")

      ReferenceTweaker.new("html").prepareFile(tempfile, tempfileHtml)

      cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} --toc --standalone --self-contained --ascii --number-sections  #{vars_string}" +
          " -t html+smart -o #{outfileHtml.esc}"

      `#{cmd}`
    end

    if format.include?("docx") then
      #todo: handle style file
      @log.debug("creating  #{outfileDocx}")

      ReferenceTweaker.new("html").prepareFile(tempfile, tempfileHtml)

      cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} #{toc} --standalone --self-contained --ascii --number-sections  #{vars_string}" +
          " -f docx+smart -o  #{outfileDocx.esc}"
      cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} --toc --standalone --self-contained --ascii --number-sections  #{vars_string}" +
          " -t docx+smart -o  #{outfileDocx.esc}"
      `#{cmd}`
    end

    if format.include?("rtf") then
      @log.debug("creating  #{outfileRtf}")
      ReferenceTweaker.new("html").prepareFile(tempfile, tempfileHtml)

      cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} --toc --standalone --self-contained --ascii --number-sections  #{vars_string}" +
          " -t rtf+smart -o  #{outfileRtf.esc}"
      `#{cmd}`
    end

    if format.include?("txt") then
      @log.debug("creating  #{outfileText}")

      ReferenceTweaker.new("pdf").prepareFile(tempfile, tempfileHtml)

      cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} --toc --standalone --self-contained --ascii --number-sections  #{vars_string}" +
          " -t plain+smart -o  #{outfileText.esc}"
      `#{cmd}`
    end

    if format.include?("slidy") then
      @log.debug("creating  #{outfileSlide}")

      ReferenceTweaker.new("html").prepareFile(tempfile, tempfileHtml)
      #todo: handle stylefile
      cmd="#{PANDOC_EXE} -f markdown#{@markdown_input_switches} #{tempfileHtml.esc} --toc --standalone --self-contained #{vars_string}" +
          "  --ascii -t s5+smart --slide-level 1 -o  #{outfileSlide.esc}"
      `#{cmd}`
    end

    if format.include?("beamer") then
      outfile      = format_config[:beamer][:outfile]
      tempformat   = format_config[:beamer][:tempfile]
      tempfile_out = tempfile_config[tempformat]
      @log.debug("creating  #{outfile}")
      ReferenceTweaker.new(tempformat).prepareFile(tempfile, tempfile_out)

      cmd = %Q{#{PANDOC_EXE} -t beamer #{tempfile_out.esc} -V theme:Warsaw -o #{outfile.esc}}
      `#{cmd}`

      #messages=latex.log_analyze("#{outdir}/#{outname}.log")
      messages = 0

      removeables = ["toc", "aux", "bak", "idx", "ilg", "ind"]
      removeables << "log" unless messages > 0


      removeables << "latex" unless format.include?("latex")
      removeables = removeables.map { |e| "#{outdir}/#{outname}.#{e}" }.select { |f| File.exists?(f) }
      removeables.each { |e|
        @log.debug "removing file: #{e}"
        FileUtils.rm e
      }
    end


  rescue Exception => e
    @log.error "failed to perform #{cmd}, \n#{e.message}"
    @log.error e.backtrace.join("\n")
    #TODO make a try catch block kere
  end
  nil
end
render_single_document(input, outdir, format) click to toggle source

render a single file @param input [String] path to the inputfile @param outdir [String] path to the output directory @param format [Array of String] formats @return [nil] no useful return value

# File lib/wortsammler/class.proolib.rb, line 832
def render_single_document(input, outdir, format)
  outname=File.basename(input, ".*")
  render_document(input, outdir, outname, format, { :geometry => "a4paper" })
end
replace_snippets_in_file(infile, snippets) click to toggle source

this replaces the text snippets in files

# File lib/wortsammler/class.proolib.rb, line 555
def replace_snippets_in_file(infile, snippets)
  input_data = File.open(infile) { |f| f.readlines.join }
  output_data=input_data.clone

  @log.debug("replacing snippets in #{infile}")

  replace_snippets_in_text(output_data, snippets)

  if (not input_data == output_data)
    File.open(infile, "w") { |f| f.puts output_data }
  end
end
replace_snippets_in_text(text, snippets) click to toggle source

this replaces the snippets in a text

# File lib/wortsammler/class.proolib.rb, line 569
def replace_snippets_in_text(text, snippets)
  changed=false
  text.gsub!(SNIPPET_PATTERN) { |m|
    replacetext_raw=snippets[$2.to_sym]

    if replacetext_raw
      changed=true
      unless $1.nil? then
        leading_whitespace=$1.split("\n", 100)
        leading_lines     =leading_whitespace[0..-1].join("\n")
        leading_spaces    =leading_whitespace.last || ""
        replacetext       =leading_lines+replacetext_raw.gsub("\n", "\n#{leading_spaces}")
      end
      @log.debug("replaced snippet #{$2} with #{replacetext}")
    else
      replacetext=m
      @log.warn("Snippet not found: #{$2}")
    end
    replacetext
  }
  #recursively process nested snippets
  #todo: this approach might rais undefined snippets twice if there are defined and undefined ones
  replace_snippets_in_text(text, snippets) if changed==true
end
resolve_xml_entities(text) click to toggle source

this resolves xml entities in Text (lt, gt, amp) @param [String] text with entities @return [String] text with replaced entities

# File lib/wortsammler/class.proolib.rb, line 717
def resolve_xml_entities(text)
  result=text
  result.gsub!("&lt;", "<")
  result.gsub!("&gt;", ">")
  result.gsub!("&amp;", "&")
  result
end