3. Usage

Preparing IDML Files

A formatting guide explains how to prepare InDesign files. While it is not mandatory to follow these recommendations to use the converter, the quality of the result may vary greatly depending on how your document is structured.

Style Mapping

Once the modified version of Pandoc is installed, Pandoc can interpret the role attributes of the elements in the DocBook files generated after conversion. These roles are then treated as classes by Pandoc using the roles-to-classes.lua filter.

role attributes correspond to the paragraph and character styles defined in InDesign. As shown in the file maps/sample.json, these styles can be mapped to specific operations:

  1. To modify their attributes:

  • role: replaces the source role with one or more classes;

  • type: changes the element type (by default, all elements are Para or Span) to a new type (a header, emphasis, quote, etc.);

  • level: if type is a title (Header tag), level sets its heading level;

  • attrs: adds or updates an attribute to all concerned elements;

  • simplify: removes all attributes and classes from elements with this paragraph or character style;

  1. Or to modify the document structure:

  • delete: removes elements with this paragraph or character style;

  • wrap: wraps the element inside another element;

  • unwrap: unwraps the content into the parent element;

  • br: adds a line break before the element;

  • empty: retains empty elements with this role (all other empty elements are removed, unless the -e/--empty option is enabled);

  • cut: creates a new file before each element with this role;

Example of a JSON entry

Let’s take an entry from the style mapping file maps/sample.json as an example:

{
    "selector": ".title",
    "operation": {
        "classes": "cool-title",
        "type": "Header",
        "level": 1,
        "attrs": {
            "hey": "oh"
        }
    }
}

In this case, all paragraphs in the IDML file associated with the title paragraph style will be identified as level-1 headings, this time with a cool-title class and, in addition, the hey attribute with the value oh. All the different operators from the previous subsection can be specified in the same way.

idml2docbook/map.py

The script idml2docbook/map.py helps create these JSON files. It takes a file generated by idml2xml-frontend and a mapping JSON file, and shows the style mappings that will be applied, for example using the following command:

python idml2docbook/map.py file.xml maps/sample.json

Basic Commands

This converter can be used in several ways:

  1. Using the idml2docbook package, then running Pandoc on the output;

  2. Using the Lua reader idml.lua;

  3. Using the batch.sh script.

idml2docbook

This Python package is the main contribution of this repository. It provides an API to convert an IDML file into one or more DocBook files.

Convert an IDML file and save it to a DocBook file:

python -m idml2docbook hello_world.idml -o hello_world.dbk

It is also possible to send the conversion result directly to Pandoc’s standard input, by specifying DocBook as an input format:

pandoc -f docbook -t markdown <(python -m idml2docbook hello_world.idml)

Display usage help with the -h/--help option:

python -m idml2docbook -h 

idml.lua

This file bridges Pandoc and idml2docbook. It allows Pandoc to use the output of idml2docbook. Pandoc does not allow arbitrary arguments on the command line, which means only modifying the .env file can affect the behavior of idml2docbook when executed through idml.lua.

Convert an IDML file to Markdown, using for example the roles-to-classes.lua filter:

pandoc -f idml.lua -t markdown --lua-filter=lua-filters/roles-to-classes.lua hello_world.idml

Specify a mapping JSON file and use it to customize your conversion command with the map.lua filter:

pandoc -f idml.lua -t markdown --lua-filter=lua-filters/map.lua -M map=maps/sample.json hello_world.idml

Computing time of idml2xml-frontend

The idml2xml-frontend converter is by far the most time-consuming step of the process. Since it doesn’t need to be customized, it usually only needs to be run once per IDML file. The intermediate result obtained by running idml2xml-frontend on a hello_world.idml file is saved in the idml2hubxml folder.

The -x/--idml2hubxml-file option lets you specify an intermediary file such as idml2hubxml/hello_world.xml as the input file for the conversion. To significantly save processing time, it is possible to resume the conversion from the intermediate result:

python -m idml2docbook -x idml2hubxml/hello_world.xml -o hello_world.dbk

List of options for idml2docbook

All options detailed here are also documented in the command-line tool (see the help with -h/--help).

  • -x, --idml2hubxml-file
    Treats the input file as a Hub XML file.
    Useful for saving processing time if idml2xml-frontend has already been run on the source IDML file.

  • -o, --output <file>
    Name to assign to the output file.
    By default, output is sent to standard output (stdout).

  • -t, --typography
    Applies French typographic refinements
    (thin spaces, non-breaking spaces, etc.).

  • -l, --thin-spaces
    Use only thin spaces for typography refinement.
    Should be used together with --typography.

  • -b, --linebreaks
    Do not replace <br> tags with spaces.

  • -p, --prettify
    Beautify the DocBook output.
    ⚠️ May introduce unwanted spaces in the output.

  • -f, --media <path>
    Path to the folder containing media files.
    Default: Links.

  • -r, --raster <extension>
    Extension to use when replacing that of raster images.
    Example: jpg.
    Default: none.

  • -v, --vector <extension>
    Extension to use when replacing that of vector images.
    Example: svg.
    Default: none.

  • -i, --idml2hubxml-output <path>
    Path to the output from Transpect’s idml2hubxml converter.
    Default: idml2hubxml.

  • -s, --idml2hubxml-script <path>
    Path to the script of Transpect’s idml2xml-frontend converter.
    Default: idml2xml-frontend.

  • --env <file>
    Path to a .env environment file for idml2docbook.
    By default, it looks for a .env file in the current directory.
    All key/value pairs in this file override the program’s default values.

  • --version
    Displays the version of idml2docbook and exits the program.

Details of the conversion command for the Déborder Bolloré reader

The command used to compile the contributions of the Déborder Bolloré reader, whose result can be found on the repository deborderbollore/articles, is as follows:

pandoc -f docbook \
       -t markdown_phpextra \
       --lua-filter=lua-filters/roles-to-classes.lua \
       --lua-filter=lua-filters/map.lua \
       -M map=maps/db.json \
       -o output/db.md \
       <(python -m idml2docbook db.idml \
                --typography \
                --thin-spaces \
                --raster jpg \
                --vector svg \
                --media images)

In detail, the Pandoc options used are as follows:

  • -f/--from: input format; here we provide the file output by idml2docbook, which is therefore in DocBook format;

  • -t/--to: output format; the most practical for Déborder Bolloré was Markdown, in the PHP Markdown Extra flavor, to be easily read by the CMS of deborderbollore.fr, Kirby.

  • --lua-filter: two Lua filters for Pandoc are used here; roles-to-classes.lua converts DocBook element role attributes into Pandoc classes, while map.lua applies the necessary mapping operations.

  • -M map: path to the JSON file containing operations to apply to certain character and paragraph styles, see Style Mapping;

  • -o/--output: specifies the folder where the generated files will be saved (here, in the output folder, with files prefixed db).

And those for idml2docbook:

  • -t/--typography: completely redoes typographic spacing (around punctuation, etc.);

  • -l/--thin-spaces: typography uses only thin spaces;

  • -r/--raster: replaces raster image extensions with jpg in the output file URLs;

  • -v/--vector: same for vector images;

  • -f/--media-folder: replaces the absolute media URLs in the output files with images/.