3. Usage
Preparing IDML Files
A formatting guide explains how to prepare InDesign files. While it is not mandatory to follow these recommendations to use the converter, the quality of the result may vary greatly depending on how your document is structured.
Style Mapping
Once the modified version of Pandoc is installed, Pandoc can interpret the role
attributes of the elements in the DocBook files generated after conversion. These roles are then treated as classes by Pandoc using the roles-to-classes.lua
filter.
role
attributes correspond to the paragraph and character styles defined in InDesign. As shown in the file maps/sample.json
, these styles can be mapped to specific operations:
To modify their attributes:
role
: replaces the source role with one or more classes;type
: changes the element type (by default, all elements arePara
orSpan
) to a new type (a header, emphasis, quote, etc.);level
: iftype
is a title (Header
tag),level
sets its heading level;attrs
: adds or updates an attribute to all concerned elements;simplify
: removes all attributes and classes from elements with this paragraph or character style;
Or to modify the document structure:
delete
: removes elements with this paragraph or character style;wrap
: wraps the element inside another element;unwrap
: unwraps the content into the parent element;br
: adds a line break before the element;empty
: retains empty elements with thisrole
(all other empty elements are removed, unless the-e
/--empty
option is enabled);cut
: creates a new file before each element with thisrole
;
Example of a JSON entry
Let’s take an entry from the style mapping file maps/sample.json
as an example:
{
"selector": ".title",
"operation": {
"classes": "cool-title",
"type": "Header",
"level": 1,
"attrs": {
"hey": "oh"
}
}
}
In this case, all paragraphs in the IDML file associated with the title
paragraph style will be identified as level-1 headings, this time with a cool-title
class and, in addition, the hey
attribute with the value oh
. All the different operators from the previous subsection can be specified in the same way.
idml2docbook/map.py
The script idml2docbook/map.py
helps create these JSON files. It takes a file generated by idml2xml-frontend
and a mapping JSON file, and shows the style mappings that will be applied, for example using the following command:
python idml2docbook/map.py file.xml maps/sample.json
Basic Commands
This converter can be used in several ways:
Using the
idml2docbook
package, then running Pandoc on the output;Using the Lua reader
idml.lua
;Using the
batch.sh
script.
idml2docbook
This Python package is the main contribution of this repository. It provides an API to convert an IDML file into one or more DocBook files.
Convert an IDML file and save it to a DocBook file:
python -m idml2docbook hello_world.idml -o hello_world.dbk
It is also possible to send the conversion result directly to Pandoc’s standard input, by specifying DocBook as an input format:
pandoc -f docbook -t markdown <(python -m idml2docbook hello_world.idml)
Display usage help with the -h
/--help
option:
python -m idml2docbook -h
idml.lua
This file bridges Pandoc and idml2docbook
. It allows Pandoc to use the output of idml2docbook
. Pandoc does not allow arbitrary arguments on the command line, which means only modifying the .env
file can affect the behavior of idml2docbook
when executed through idml.lua
.
Convert an IDML file to Markdown, using for example the roles-to-classes.lua
filter:
pandoc -f idml.lua -t markdown --lua-filter=lua-filters/roles-to-classes.lua hello_world.idml
Specify a mapping JSON file and use it to customize your conversion command with the map.lua
filter:
pandoc -f idml.lua -t markdown --lua-filter=lua-filters/map.lua -M map=maps/sample.json hello_world.idml
Computing time of idml2xml-frontend
The idml2xml-frontend
converter is by far the most time-consuming step of the process. Since it doesn’t need to be customized, it usually only needs to be run once per IDML file. The intermediate result obtained by running idml2xml-frontend
on a hello_world.idml
file is saved in the idml2hubxml
folder.
The -x
/--idml2hubxml-file
option lets you specify an intermediary file such as idml2hubxml/hello_world.xml
as the input file for the conversion. To significantly save processing time, it is possible to resume the conversion from the intermediate result:
python -m idml2docbook -x idml2hubxml/hello_world.xml -o hello_world.dbk
List of options for idml2docbook
All options detailed here are also documented in the command-line tool (see the help with -h
/--help
).
-x
,--idml2hubxml-file
Treats the input file as a Hub XML file.
Useful for saving processing time ifidml2xml-frontend
has already been run on the source IDML file.-o
,--output <file>
Name to assign to the output file.
By default, output is sent to standard output (stdout).-t
,--typography
Applies French typographic refinements
(thin spaces, non-breaking spaces, etc.).-l
,--thin-spaces
Use only thin spaces for typography refinement.
Should be used together with--typography
.-b
,--linebreaks
Do not replace<br>
tags with spaces.-p
,--prettify
Beautify the DocBook output.
⚠️ May introduce unwanted spaces in the output.-f
,--media <path>
Path to the folder containing media files.
Default:Links
.-r
,--raster <extension>
Extension to use when replacing that of raster images.
Example:jpg
.
Default: none.-v
,--vector <extension>
Extension to use when replacing that of vector images.
Example:svg
.
Default: none.-i
,--idml2hubxml-output <path>
Path to the output from Transpect’sidml2hubxml
converter.
Default:idml2hubxml
.-s
,--idml2hubxml-script <path>
Path to the script of Transpect’sidml2xml-frontend
converter.
Default:idml2xml-frontend
.--env <file>
Path to a.env
environment file foridml2docbook
.
By default, it looks for a.env
file in the current directory.
All key/value pairs in this file override the program’s default values.--version
Displays the version ofidml2docbook
and exits the program.
Details of the conversion command for the Déborder Bolloré reader
The command used to compile the contributions of the Déborder Bolloré reader, whose result can be found on the repository deborderbollore/articles, is as follows:
pandoc -f docbook \
-t markdown_phpextra \
--lua-filter=lua-filters/roles-to-classes.lua \
--lua-filter=lua-filters/map.lua \
-M map=maps/db.json \
-o output/db.md \
<(python -m idml2docbook db.idml \
--typography \
--thin-spaces \
--raster jpg \
--vector svg \
--media images)
In detail, the Pandoc options used are as follows:
-f
/--from
: input format; here we provide the file output byidml2docbook
, which is therefore in DocBook format;-t
/--to
: output format; the most practical for Déborder Bolloré was Markdown, in the PHP Markdown Extra flavor, to be easily read by the CMS of deborderbollore.fr, Kirby.--lua-filter
: two Lua filters for Pandoc are used here;roles-to-classes.lua
converts DocBook elementrole
attributes into Pandoc classes, whilemap.lua
applies the necessary mapping operations.-M map
: path to the JSON file containing operations to apply to certain character and paragraph styles, see Style Mapping;-o
/--output
: specifies the folder where the generated files will be saved (here, in theoutput
folder, with files prefixeddb
).
And those for idml2docbook
:
-t
/--typography
: completely redoes typographic spacing (around punctuation, etc.);-l
/--thin-spaces
: typography uses only thin spaces;-r
/--raster
: replaces raster image extensions withjpg
in the output file URLs;-v
/--vector
: same for vector images;-f
/--media-folder
: replaces the absolute media URLs in the output files withimages/
.