class Paru::Filter
Filter
is used to write your own pandoc filter in Ruby. A Filter
is almost always created and immediately executed via the run
method. The most simple filter you can write in paru is the so-called “identity”:
{include:file:examples/filters/identity.rb}
It runs the filter, but it makes no selection nor performs an action. This is pretty useless, of course—although it makes for a great way to test the filter functionality—, but it shows the general setup of a filter well.
Writing a simple filter: numbering figures¶ ↑
Inside a {Filter#run} block, you specify selectors with actions. For example, to number all figures in a document and prefix their captions with “Figure”, the following filter would work:
{include:file:examples/filters/number_figures.rb}
This filter selects all {PandocFilter::Image} nodes. For each {PandocFilter::Image} node it increments the figure counter figure_counter
and then sets the figure’s caption to “Figure” followed by the figure count and the original caption. In other words, the following input document
 
will be transformed into
 
The method {PandocFilter::InnerMarkdown#inner_markdown} and its counterpart {PandocFilter::Node#markdown} are a great way to manipulate the contents of a selected {PandocFilter::Node}. No messing about creating and filling {PandocFilter::Node}s, you can just use pandoc’s own markdown format!
Writing a more involved filters¶ ↑
Using the “follows” selector: Numbering figures and chapters¶ ↑
The previous example can be extended to also number chapters and to start numbering figures anew per chapter. As you would expect, we need two counters, one for the figures and one for the chapters:
{include:file:examples/filters/number_figures_per_chapter.rb}
What is new in this filter, however, is the selector “Header + Image” which selects all {PandocFilter::Image} nodes that follow a {PandocFilter::Header} node. Documents in pandoc have a flat structure where chapters do not exists as separate concepts. Instead, a chapter is implied by a header of a certain level and everything that follows until the next header of that level.
Using the “child of” selector: Annotate custom blocks¶ ↑
Hierarchical structures do exist in a pandoc document, however. For example, the contents of a paragraph ({PandocFilter::Para}), which itself is a {PandocFilter::Block} level node, are {PandocFilter::Inline} level nodes. Another example are custom block or {PandocFilter::Div} nodes. You select a child node by using the +>+ selector as in the example below:
{include:file:examples/filters/example.rb}
Here all {PandocFilter::Header} nodes that are inside a {PandocFilter::Div} node are selected. Furthermore, if these headers are of level 3, they are prefixed by the string “Example” followed by a count.
In this example, “important” {PandocFilter::Div} nodes are annotated by putting the string important before the contents of the node.
Using a distance in a selector: Capitalize the first N characters of¶ ↑
a paragraph
Given the flat structure of a pandoc document, the “follows” selector has quite a reach. For example, “Header + Para” selects all paragraphs that follow a header. In most well-structured documents, this would select basically all paragraphs.
But what if you need to be more specific? For example, if you would like to capitalize the first sentence of each first paragraph of a chapter, you need a way to specify a sequence number of sorts. To that end, paru filter selectors take an optional distance parameter. A filter for this example could look like:
{include:file:examples/filters/capitalize_first_sentence.rb}
The distance is denoted after a selector by an integer. In this case “Header +1 Para” selects all {PandocFilter::Para} nodes that directly follow an {PandocFilter::Header} node. You can use a distance with any selector.
Manipulating nodes: Removing horizontal lines¶ ↑
Although the {PandocFilter::InnerMarkdown#inner_markdown} and {PandocFilter::Node#markdown} work in most situations, sometimes direct manipulation of the pandoc document AST is useful. These {PandocFilter::ASTManipulation} methods are mixed in {PandocFilter::Node} and can be used on any node in your filter. For example, to delete all {PandocFilter::HorizontalRule} nodes, can use a filter like:
{include:file:examples/filters/delete_horizontal_rules.rb}
Note that you could have arrived at the same effect by using:
rule.markdown = ""
Manipulating metadata: ¶ ↑
One of the interesting features of the pandoc markdown format is the ability to add metadata to a document via a YAML block or command line options. For example, if you use a template that uses the metadata property +$date$+ to write a date on a title page, it is quite useful to automatically add the date of today to the metadata. You can do so with a filter like:
{include:file:examples/filters/add_today.rb}
In a filter, the metadata
property is a Ruby Hash of Strings, Numbers, Booleans, Arrays, and Hashes. You can manipulate it like any other Ruby Hash.
@!attribute metadata
@return [Hash] The metadata of the document being filtered as a Ruby Hash
@!attribute document
@return [Document] The document being filtered
@!attribute current_node
@return [Node] The node in the AST of the document being filtered that is currently being inspected by the filter.
Attributes
Public Class Methods
Source
# File lib/paru/filter.rb, line 234 def initialize(input = $stdin, output = $stdout) @input = input @output = output end
Create a new Filter
instance. For convenience, {run} creates a new {Filter} and runs it immediately. Use this constructor if you want to run a filter on different input and output streams that STDIN and STDOUT respectively.
@param input [IO = $stdin] the input stream to read, defaults to
STDIN
@param output [IO = $stdout] the output stream to write, defaults to
STDOUT
Source
# File lib/paru/filter.rb, line 251 def self.run(&block) Filter.new($stdin, $stdout).filter(&block) end
Run the filter specified by block. This is a convenience method that creates a new {Filter} using input stream STDIN and output stream STDOUT and immediately runs {filter} with the block supplied.
@param block [Proc] the filter specification
@example Add ‘Figure’ to each image’s caption
Paru::Filter.run do with "Image" do |image| image.inner_markdown = "Figure. #{image.inner_markdown}" end end
Public Instance Methods
Source
# File lib/paru/filter.rb, line 338 def after() yield @document if @ran_after end
After running the filter on all nodes, the document
is passed to the block to this after
method. This method is run exactly once.
@yield [Document] the document
Source
# File lib/paru/filter.rb, line 330 def before() yield @document unless @ran_before end
Before running the filter on all nodes, the document
is passed to the block to this before
method. This method is run exactly once.
@yield [Document] the document
Source
# File lib/paru/filter.rb, line 275 def filter(&block) @selectors = Hash.new @filtered_nodes = [] @document = read_document @metadata = PandocFilter::Metadata.new @document.meta nodes_to_filter = Enumerator.new do |node_list| @document.each_depth_first do |node| node_list << node end end @current_node = @document @ran_before = false @ran_after = false instance_eval(&block) # run filter with before block @ran_before = true nodes_to_filter.each do |node| if @current_node.has_been_replaced? @current_node = @current_node.get_replacement @filtered_nodes.pop else @current_node = node end @filtered_nodes.push @current_node instance_eval(&block) # run the actual filter code end @ran_after = true instance_eval(&block) # run filter with after block write_document end
Create a filter using block
. In the block you specify selectors and actions to be performed on selected nodes. In the example below, the selector is “Image”, which selects all image nodes. The action is to prepend the contents of the image’s caption by the string “Figure. ”.
@param block [Proc] the filter specification
@return [JSON] a JSON string with the filtered pandoc AST
@example Add ‘Figure’ to each image’s caption
input = IOString.new(File.read("my_report.md") output = IOString.new Paru::Filter.new(input, output).filter do with "Image" do |image| image.inner_markdown = "Figure. #{image.inner_markdown}" end end
Source
# File lib/paru/filter.rb, line 349 def stop!() write_document exit true end
Stop processing the document any further and output it as it is now. This is a great timesaver for filters that only act on a small number of nodes in a large document, or when you only want to set the metadata.
Note, stop will break off the filter immediately after outputting the document in its current state.
Source
# File lib/paru/filter.rb, line 319 def with(selector) if @ran_before and !@ran_after @selectors[selector] = Selector.new selector unless @selectors.has_key? selector yield @current_node if @selectors[selector].matches? @current_node, @filtered_nodes end end
Specify what nodes to filter with a selector
. If the current_node
matches that selector, it is passed to the block to this with
method.
@param selector [String] a selector string @yield [Node] the current node if it matches the selector
Private Instance Methods
Source
# File lib/paru/filter.rb, line 361 def read_document() PandocFilter::Document.from_JSON @input.read end
The Document node from JSON formatted pandoc document structure on STDIN that is being filtered
@return [Document] create a new Document node from a pandoc AST from
JSON from STDIN
Source
# File lib/paru/filter.rb, line 366 def write_document() @document.meta = @metadata.to_meta @output.write @document.to_JSON end
Write the document being filtered to STDOUT