3. Usage
Preparing IDML Files
A formatting guide explains how to prepare InDesign files. While it is not mandatory to follow these recommendations to use the converter, the quality of the result may vary greatly depending on how your document is structured.
Style Mapping
Once the modified version of Pandoc is installed, Pandoc can interpret the role
attributes of the elements in the DocBook files generated after conversion. These roles are then treated as classes by Pandoc using the roles-to-classes.lua
filter.
role
attributes correspond to the paragraph and character styles defined in InDesign. As shown in the file maps/sample.json
, these styles can be mapped to specific treatments:
role
: replaces the source role with one or more classes;type
: changes the DocBook element type (default ispara
for all elements) to a new type;level
: iftype
is a DocBook title (title
tag),level
sets its heading level;delete
: removes elements with thisrole
attribute;wrap
: wraps the element in another container, needed for lists defined only by a paragraph style (itemizedlist
,orderedlist
) and for blockquotes (blockquote
);unwrap
: unwraps the content into the parent element;br
: adds a line break before the element;cut
: creates a new file before each element with thisrole
;empty
: retains empty elements with thisrole
(all other empty elements are removed, unless the-e
/--empty
option is enabled);If the entry is empty (
{}
), therole
attribute is removed.
The script idml2docbook/map.py
helps create these JSON files. It takes a file generated by idml2xml-frontend
and a mapping JSON file, and shows the style mappings that will be applied, for example using the following command:
python idml2docbook/map.py file.xml maps/map.json
Basic Commands
This converter can be used in several ways:
Using the
idml2docbook
package, then running Pandoc on the output;Using the Lua reader
idml.lua
;Using the
batch.sh
script.
idml2docbook
This Python package is the main contribution of this repository. It provides an API to convert an IDML file into one or more DocBook files.
Convert an IDML file and save it to a DocBook file:
python -m idml2docbook hello_world.idml -o hello_world.dbk
It is also possible to send the conversion result directly to Pandoc’s standard input:
pandoc -f docbook -t markdown <(python -m idml2docbook hello_world.idml)
Display usage help with the -h
/--help
option:
python -m idml2docbook -h
idml.lua
This file bridges Pandoc and idml2docbook
. It allows Pandoc to use the output of idml2docbook
. Pandoc does not allow arbitrary arguments on the command line, which means only modifying the .env
file can affect the behavior of idml2docbook
when executed through idml.lua
.
Convert an IDML file to Markdown, using for example the roles-to-classes.lua
filter:
pandoc -f idml.lua -t markdown --lua-filter=lua-filters/roles-to-classes.lua hello_world.idml
batch.sh
First, you must specify the Pandoc version to use in the .env
file. You must also make this script executable:
chmod +x batch.sh
This small shell script makes it easier to link Pandoc with idml2docbook
, particularly when using the -c
/--cut
option to split the output into multiple files (or chapters).
This command takes two arguments: an input folder and an output folder. You can then chain commands, for example:
python -m idml2docbook file.idml --output docbook_folder --cut ; ./batch.sh docbook_folder md_folder
Computing time of idml2xml-frontend
The idml2xml-frontend
converter is by far the most time-consuming step of the process. Since it doesn’t need to be customized, it usually only needs to be run once per IDML file. The intermediate result obtained by running idml2xml-frontend
on a hello_world.idml
file is saved in the idml2hubxml
folder.
The -x
/--idml2hubxml-file
option lets you specify an intermediary file such as idml2hubxml/hello_world.xml
as the input file for the conversion. To significantly save processing time, it is possible to resume the conversion from the intermediate result:
python -m idml2docbook -x idml2hubxml/hello_world.xml -o hello_world.dbk
List of options
All options detailed here are also documented in the command-line tool (see the help with -h
/--help
).
-x
,--idml2hubxml-file
Treats the input file as a Hub XML file.
Useful for saving processing time ifidml2xml-frontend
has already been run on the source IDML file.-o
,--output <file>
Name to assign to the output file.
By default, output is sent to standard output (stdout).-m
,--map <JSON file>
Name of the JSON mapping file to use for role-specific processing.
Default:maps/sample.json
.-e
,--empty
Do not remove empty elements with roles
Example:<para role="r"></para>
⚠️ May retain unwanted residual elements!-g
,--hierarchy
Do not generate nested sections from a flat hierarchy.
Should be used together with--map
.-c
,--cut
Splits the input file into multiple output files. \ Works with--map
(to define the splits).
If used with--output
, the latter is treated as a directory.-n
,--names
Generates output filenames from section identifiers.
Should be used only with--cut
.-t
,--typography
Applies French typographic refinements
(thin spaces, non-breaking spaces, etc.).-l
,--thin-spaces
Use only thin spaces for typography refinement.
Should be used together with--typography
.-b
,--linebreaks
Do not replace<br>
tags with spaces.-p
,--prettify
Beautify the DocBook output.
⚠️ May introduce unwanted spaces in the output.-f
,--media <path>
Path to the folder containing media files.
Default:Links
.-r
,--raster <extension>
Extension to use when replacing that of raster images.
Example:jpg
.
Default: none.-v
,--vector <extension>
Extension to use when replacing that of vector images.
Example:svg
.
Default: none.-i
,--idml2hubxml-output <path>
Path to the output from Transpect’sidml2hubxml
converter.
Default:idml2hubxml
.-s
,--idml2hubxml-script <path>
Path to the script of Transpect’sidml2xml-frontend
converter.
Default:idml2xml-frontend
.--env <file>
Path to a.env
environment file foridml2docbook
.
By default, it looks for a.env
file in the current directory.
All key/value pairs in this file override the program’s default values.--version
Displays the version ofidml2docbook
and exits the program.
Details of the conversion command for the Déborder Bolloré reader
The command used to compile the contributions of the Déborder Bolloré reader, whose result can be found on the repository deborderbollore/articles, is as follows:
python -m idml2docbook db.idml
--output docbook
--map maps/map.json
--cut
--typography
--thin-spaces
--names
--raster jpg
--vector svg
--media-folder images ;
./batch.sh docbook articles
In detail:
-o
/--output
: specifies the folder where the output DocBook files will be saved;-m
/--map
: defines the operations to apply to certain character and paragraph styles, see Style Mapping;-c
/--cut
: splits the result into multiple standalone DocBook files;-t
/--typography
: completely redoes typographic spacing (around punctuation, etc.);-l
/--thin-spaces
: typography uses only thin spaces;-n
/--names
: generates filenames based on level-1 heading content;-r
/--raster
: replaces raster image extensions withjpg
in the output file URLs;-v
/--vector
: same for vector images;-f
/--media-folder
: replaces the absolute media URLs in the output files withimages/
.
Finally, the batch.sh
script converts the resulting DocBook files into Markdown using the markdown_phpextra
flavor and the necessary Lua filters. It also transforms Markdown line breaks (
, double spaces) into HTML <br/>
tags.