Markdown
Crawler.ID2MD is a Crawler-based software product which provides InDesign to Markdown export. Crawler.ID2MD is a temporary name; we will eventually come up with some better name.
The Markdown syntax definitions can be found here:
http://daringfireball.net/projects/markdown/syntax
A .zip file with some sample documents and the Markdown conversions can be downloaded here:
[Crawler.ID2MD.Samples.zip|http://www.docdataflow.com/wiki/images/2/2f/Crawler.ID2MD.Samples.zip]
An early product preview is available as a .zip archive. Email [email protected] if you want to test it out.
To install, you need to first decompress the .zip archive and get to a folder Crawler.ID2MD which contains a file called Export.jsxbin.
Export.jsxbin is the script you'll need to run to activate Crawler.ID2MD.
Contents
- 1 Installation
- 2 Running
- 3 Configuration
- 3.1 config.ini
- 3.1.1 MARKDOWN_FILENAME_EXTENSION
- 3.1.2 imagesFolder
- 3.1.3 headingStylesLevel<n>
- 3.1.4 minPointSizeLevel<n>
- 3.1.5 blockquoteStyles
- 3.1.6 imageExportDPI
- 3.1.7 imageExportFormat
- 3.1.8 boldFontStyles
- 3.1.9 italicFontStyles
- 3.1.10 asBitmapTextFrameLabel
- 3.1.11 ignoreFrameLabel
- 3.1.12 ignoreInvisibleLayers
- 3.1.13 ignoreLayers
- 3.1.14 overrideAllMasterPageItems
- 3.1.15 textFrameSplitting
- 3.2 Snippets
- 3.3 Constants and Formulas
- 3.4 Handling the raw text
- 3.1 config.ini
Installation
To install Crawler.ID2MD, you first need to launch InDesign, and find the Scripts panel. If it is not visible, you can make it appear by means of the Window - Utilities - Scripts menu item.
Once the Scripts panel is visible, right-click or <Control>-click the User entry and select 'Reveal in Finder' (on Mac) or 'Reveal in Explorer' (on Windows).
A window on a folder called Scripts should open.
Inside there should be a folder called Scripts Panel. Double-click its icon to enter it.
Once you're inside the Scripts Panel folder, you can drag the Crawler.ID2MD folder into it.
Now switch back to InDesign, and verify that the Crawler.ID2MD folder has appeared under the User folder on the Scripts panel:
Click the disclosure triangle - you should now see Export.jsxbin.
Running
Open an InDesign document, and double-click Export.jsxbin on the Scripts panel.
A new file with the same name as the original document should appear. The file name extension of the new file is .md by default. If your InDesign document is called MyDocument.indd you should see MyDocument.md appear next to it.
If the InDesign document has graphics in it, these will be exported into a separate folder called images by default.
Configuration
Crawler.ID2MD can be configured through a number of configuration files.
config.ini
There are two files named config.ini. One resides next to Export.jsxbin and configures the Crawler system; you will rarely, if ever need to change this file.
The second config.ini resides in the Personalities/Markdown subfolder and configures the Markdown features; this file will be where most configuration will be done.
These config.ini files can be opened with a standard text editor (e.g. TextWrangler on Macintosh: http://www.barebones.com/products/textwrangler/ or Notepad++ on Windows: http://notepad-plus-plus.org/download/ ).
One trick to quickly navigate to the config.ini file is to disclose it on the Scripts panel, right-click or <Control>-click it, and select 'Reveal in Finder' or 'Reveal in Explorer'.
Once you see the file, use a text editor to open it.
The most relevant configurations are described here:
MARKDOWN_FILENAME_EXTENSION
This setting sets the file name extension to use for the output files (default: md).
imagesFolder
This is the name of the subfolder to use for storing the exported graphic frames into (default: images).
headingStylesLevel<n>
Where <n> is 1 up to 6. This setting is a comma-separated list of paragraph style names that must be converted to a header of level 1 to 6 (defaults: headingStylesLevel1 is Heading, Title; headingStylesLevel2 - headingStylesLevel6 are empty).
minPointSizeLevel<n>
Where <n> is 1 up to 6. This is a number of points from which text will be considered to be of the corresponding heading level. (defaults: minPointSizeLevel1 is 18; minPointSizeLevel2 - minPointSizeLevel6 are empty).
For example, if minPointSizeLevel1 is 18, then any paragraph that starts with a glyph that is 18pt or larger will be considered to be a first-level heading.
These can be left empty. When defining multiple levels, it should be true that minPointSizeLevel1 > minPointSizeLevel2 > ... > minPointSizeLevel6.
In other words, if any, minPointSizeLevel6 has to be the smallest value, and minPointSizeLevel1 has to be the largest value.
blockquoteStyles
The names any paragraph styles that will be converted to blockquotes (i.e. prefixed with '> ') (default: nothing).
imageExportDPI
The image resolution to use for exporting any graphic frames (default: 72).
imageExportFormat
This is either PNG or JPEG. It tells Crawler what image format to use for the graphic frames (default: PNG).
PNG is only supported in CS6 and above.
boldFontStyles
This setting is a list of font style names separated with | characters. The listed font style names will be considered to be bold (default bold|heavy|black)
This translates to prefixing and suffixing the text with two asterisks: **.
italicFontStyles
This setting is a list of font style names separated with | characters. The listed font style names will be considered to be italic (default italic|oblique|slanted)
This translates to prefixing and suffixing the text with an underscore: _.
asBitmapTextFrameLabel
This setting is a word you can assign to individual frames in the document via the Script Label panel (default: nothing).
By entering this word onto the Script Label panel you can force selected text frames to be exported as bitmaps instead of as text.
ignoreFrameLabel
This setting is a word you can assign to individual frames in the document via the Script Label panel (default: ignore).
By entering this word onto the Script Label panel you can force the selected page item to be omitted from the output file.
ignoreInvisibleLayers
This setting is either 0 or 1 (default: 0).
Setting it to 1 will force Crawler to omit any frames that are on invisible layers.
ignoreLayers
This setting is a comma-separated list of layer names. It can be left empty (default: nothing).
Any items on a layer named here will be suppressed from the output.
overrideAllMasterPageItems
This setting is either 0 or 1 (default: 0). Setting it to 1 will force Crawler to express master page items in the export.
textFrameSplitting
This setting is either 0 or 1 (default: 0). If it is 0, the export will be done on a story-by-story basis. Setting it to 1 forces Crawler to export on a textframe-by-textframe basis.
Snippets
The folder Personalities/Markdown/markdownSnippets contains a number of template files that are used to generate the Markdown output.
These files can all be opened with a standard text editor.
For example, document.md.snippet contains:
[//]: # (Document markdown file $$MARKDOWN_FILENAME$$ generated by $$SOFTWARE$$ $$VERSION$$) $$INPUT_TEXT$$
Any text between two $$ is a placeholder which will be replaced by some calculated strings.
The various snippets represent different concentric layers of complexity. At the inner level is the text.run.md.snippet. This snippet is used to format individual style runs from the original document.
The next snippet text.paragraph.md.snippet will take the collated/concatenated output from the text.run.md.snippet snippet.
The output of each type of snippet is concatenated, and 'handed up' to the next level of snippet, until eventually, the output is passed through the document.md.snippet where it gets its final shape.
text.run.md.snippet
$$TEXT_RUN_STYLE_FORMAT_PREFIX$$$$INPUT_TEXT$$$$TEXT_RUN_STYLE_FORMAT_SUFFIX$$
The expressions $$TEXT_RUN_STYLE_FORMAT_PREFIX$$ and $$TEXT_RUN_STYLE_FORMAT_SUFFIX$$ normally are replaced with nothing, except when the style run is bold or italic, when they become **, _ or **_.
Bold style runs are prefixed and suffixed with '**', italic style runs are prefixed and suffixed with _ and bold italic style runs are prefixed and suffixed with _** and **_.
$$INPUT_TEXT$$ is replaced by the text of the style run extracted from the original document. This text will also have some special characters escaped, e.g. ! is converted to \!, # is converted to \# and so on, as to make sure these characters are not mistaken for Markdown syntax.
text.paragraph.md.snippet
$$HEADING_PREFIX$$$$LINE_BROKEN_INPUT_TEXT$$
This snippet is 'one level up' from the previous one. $$LINE_BROKEN_INPUT_TEXT$$ is replaced by the collated text of all the style runs in the paragraph, after which 'soft returns' are prefixed with two extra spaces, so Markdown preserves soft line breaks.
$$LINE_BROKEN_INPUT_TEXT$$ is calculated by a formula based on $$INPUT_TEXT$$, where $$INPUT_TEXT$$ is simply the concatenated/collated text of all text runs in the paragraph.The formula can be found in Personalities/Markdown/formulas/paragraph.jsx.snippet:
/ **************** $$LINE_BROKEN_INPUT_TEXT$$ = { var retVal = null; do { var inputText = $$INPUT_TEXT$$; if (! inputText || inputText == "") { break; } retVal = inputText.replace(/^\s+/,""); retVal = retVal.replace(/([\n\r])\s+/g,"$1"); retVal = retVal.replace(/\s*([\n\r])/g," $1"); } while (false); return retVal; } // ****************
frame.text.md.snippet
$$INPUT_TEXT$$
This snippet is again 'one level out' from the previous text.paragraph.md.snippet snippet. $$INPUT_TEXT$$ is replaced by the collated text of all the paragraphs in the text frame. This particular snippet boils down to a 'do nothing': simply pass on the data received by collating the paragraphs.
frame.graphic.md.snippet
$$FRAME_PREFIX$$![$$FRAME_IMAGE_NAME$$]($$FRAME_IMAGE_PATH$$)$$FRAME_SUFFIX$$
This snippet is on the same level - where the previous snippet is used for text frames, this one is used for graphical frames.
$$FRAME_PREFIX$$ and $$FRAME_SUFFIX$$ are replaced by nothing for anchored and inline graphics, so the graphic is displayed in-line in Markdown.
For 'floating' frames, these are instead replaced by a few extra newlines, to separate the graphic from the text.
$$FRAME_IMAGE_NAME$$ is replaced by the name of the image, and $$FRAME_IMAGE_PATH$$ becomes the relative path of the exported graphic.
document.md.snippet
This is the 'top level' snippet.
[//]: # (Document markdown file $$MARKDOWN_FILENAME$$ generated by $$SOFTWARE$$ $$VERSION$$) $$INPUT_TEXT$$
$$INPUT_TEXT$$ is replaced by the collation of all lower-level snippets.
The first line in this snippet is a Markdown comment line with some meta-info about the document.
Constants and Formulas
In the snippets, there are a lot of references to placeholders between two $$.
Many of these placeholders are calculated automatically by Crawler, but it is also possible to define custom placeholders, either as configuration constants or as calculated formulas.
Constants
Any entry made in the [appContextData] in the config.ini becomes a placeholder. For example, the entry MARKDOWN_FILENAME_EXTENSION
[appContextData] MARKDOWN_FILENAME_EXTENSION = .md
causes a placeholder $$MARKDOWN_FILENAME_EXTENSION$$ to become available for use in the snippets.
If you were to add a new line, for example
[appContextData] MARKDOWN_FILENAME_EXTENSION = .md AUTHOR_NAME = "John Doe"
you could start using a placeholder $$AUTHOR_NAME$$ in the snippets.
Formulas
These are small bits of ExtendScript (JavaScript)-like code which express how a certain placeholder can be calculated.
The code is not 'pure' ExtendScript - there is some pre-processing to handle 'placeholders' in the scripting code.
These files are stored in Personalities/Markdown/formulas
For example, the $$FRAME_PREFIX$$ placeholder in the frame.graphic.md.snippet is calculated by a formula in graphicframe.jsx.snippet.
// **************** $$FRAME_PREFIX$$ = { var retVal = null; do { var granule = $$RAW_GRANULE$$; if (! (granule instanceof G.FrameGranule)) { break; } var frame = granule.getData(); if (frame.parent instanceof Character) { retVal = ""; } else { retVal = " \n"; } } while (false); return retVal; }
I won't go into too much detail, but essentially this is expressing that if the graphic frame has a Character as its 'parent' in InDesign (which means it is inline or anchored in text), then the return is "" (nothing). In all other cases, the return is " \n" (forced line break in Markdown).
Handling the raw text
One formula in particular is of interest. In the run.jsx.snippet you'll find the following formula:
// **************** $$RAW_TEXT$$ = { var retVal = undefined; do { retVal = $$RAW_TEXT$$; if (! retVal || retVal == "") { break; } retVal = retVal.replace(/#/g,"\\#") retVal = retVal.replace(/\*/g,"\\*") retVal = retVal.replace(/_/g,"\\_"); retVal = retVal.replace(/!/g,"\\!") } while (false); return retVal; } // ****************
This formula is responsible for escaping special characters in the input. If additional characters need to be escaped, this is the place to do it.