3. Usage

Preparing IDML Files

A formatting guide explains how to prepare InDesign files. While it is not mandatory to follow these recommendations to use the converter, the quality of the result may vary greatly depending on how your document is structured.

Style Mapping

Once the modified version of Pandoc is installed, Pandoc can interpret the role attributes of the elements in the DocBook files generated after conversion. These roles are then treated as classes by Pandoc using the roles-to-classes.lua filter.

role attributes correspond to the paragraph and character styles defined in InDesign. As shown in the file maps/sample.json, these styles can be mapped to specific treatments:

  • role: replaces the source role with one or more classes;

  • type: changes the DocBook element type (default is para for all elements) to a new type;

  • level: if type is a DocBook title (title tag), level sets its heading level;

  • delete: removes elements with this role attribute;

  • wrap: wraps the element in another container, needed for lists defined only by a paragraph style (itemizedlist, orderedlist) and for blockquotes (blockquote);

  • unwrap: unwraps the content into the parent element;

  • br: adds a line break before the element;

  • cut: creates a new file before each element with this role;

  • empty: retains empty elements with this role (all other empty elements are removed, unless the -e/--empty option is enabled);

  • If the entry is empty ({}), the role attribute is removed.

The script idml2docbook/map.py helps create these JSON files. It takes a file generated by idml2xml-frontend and a mapping JSON file, and shows the style mappings that will be applied, for example using the following command:

python idml2docbook/map.py file.xml maps/db.json

Basic Commands

This converter can be used in several ways:

  1. Using the idml2docbook package, then running Pandoc on the output;

  2. Using the Lua reader idml.lua;

  3. Using the batch.sh script.

idml2docbook

This Python package is the main contribution of this repository. It provides an API to convert an IDML file into one or more DocBook files.

Convert an IDML file and save it to a DocBook file:

python -m idml2docbook hello_world.idml -o hello_world.xml

It is also possible to send the conversion result directly to Pandoc’s standard input:

pandoc -f docbook -t markdown <(python -m idml2docbook hello_world.idml)

Display usage help with the -h/--help option:

python -m idml2docbook -h 

idml.lua

This file bridges Pandoc and idml2docbook. It allows Pandoc to use the output of idml2docbook. Pandoc does not allow arbitrary arguments on the command line, which means only modifying the .env file can affect the behavior of idml2docbook when executed through idml.lua.

Convert an IDML file to Markdown, using for example the roles-to-classes.lua filter:

pandoc -f idml.lua -t markdown --lua-filter=lua-filters/roles-to-classes.lua hello_world.idml

batch.sh

First, you must specify the Pandoc version to use in the .env file. You must also make this script executable:

chmod +x batch.sh

This small shell script makes it easier to link Pandoc with idml2docbook, particularly when using the -c/--cut option to split the output into multiple files (or chapters).

This command takes two arguments: an input folder and an output folder. You can then chain commands, for example:

python -m idml2docbook file.idml --output docbook_folder --cut ; ./batch.sh docbook_folder md_folder

Details of the conversion command for the Déborder Bolloré reader

The command used to compile the contributions of the Déborder Bolloré reader, whose result can be found on the repository deborderbollore/articles, is as follows:

python -m idml2docbook db.idml \
  --output docbook \
  --map maps/db.json \
  --cut \
  --typography \
  --thin-spaces \
  --names \
  --raster jpg \
  --vector svg \
  --media-folder images ;
./batch.sh docbook articles

In detail:

  • -o/--output: specifies the folder where the output DocBook files will be saved;

  • -m/--map: defines the operations to apply to certain character and paragraph styles, see Style Mapping;

  • -c/--cut: splits the result into multiple standalone DocBook files;

  • -t/--typography: completely redoes typographic spacing (around punctuation, etc.);

  • -l/--thin-spaces: typography uses only thin spaces;

  • -n/--names: generates filenames based on level-1 heading content;

  • -r/--raster: replaces raster image extensions with jpg in the output file URLs;

  • -v/--vector: same for vector images;

  • -f/--media-folder: replaces the absolute media URLs in the output files with images/.

Other configuration options are available via this package’s options (see the help with -h/--help).

Finally, the batch.sh script converts the resulting DocBook files into Markdown using the markdown_phpextra flavor and the necessary Lua filters. It also transforms Markdown line breaks (  , double spaces) into HTML <br/> tags.