Custom Personality Tutorial
First, we'll build a very simple personality, and we'll gradually extend it.
The basis of nearly all document conversion personalities is the ViewExporter adapter.
The ViewExporter combines two 'main' adapters: a disassembler (which breaks a document granule into smaller granules) and an assembler (which takes the granules coming out of the disassembler, and builds the desired end-result).
The disassembler is part of the default Crawler setup. When running Crawler, it'll ask the currently active application to provide it with a disassembler, and it will use that disassembler in the ViewExporter.
The disassembler is configured through the configuration files.
Adjusting The Top-Level config.ini
First, we'll enhance the top-level configuration file so it knows about the new personality we're going to build.
Let's call the personality 'tutorial'.
Change the config.ini so it looks similar to this (I've omitted the comments for brevity):
[conditionals] selectors = tutorial [main] personalityConfig?tutorial = "./Personalities/Tutorial/config.ini" personalityConfig?text = "./Personalities/Text/config.ini" # ******************************************************************************** [debug] debugMonitoring = true monitorAdapters = inputSplitter logLevel = 5 logFileName = Crawler.log
This tells Crawler that we want to select 'tutorial', and it also says that the personalityConfig entry needs to be the config.ini in the Tutorial folder inside the Personalities folder.
We also switch on debug monitoring, and hook a Debug Monitor into the inputSplitter adapter inside the ViewExporter.
Creating A Tutorial Personality
The next step is to make a start building the new personality.
Open the Personalities folder, and create a new subfolder called Tutorial. Inside that subfolder, create a text file called config.ini.
Put the following text in this config.ini file:
[main] views = tutorialView nesting = document/text.story [main:tutorialView] fileSplitLevel = document xmlEncode = 0 accepted = document, text.story [flush:tutorialView] document = text.story
With this config file, we tell the ViewExporter that we only need a single view, named tutorialView.
We'll be processing InDesign documents, which have a 'natural' hierarchy: documents contain stories, stories contain paragraphs, paragraphs contain text runs, text runs contain words.
(Remark: this is not the only hierarchy we could use in InDesign documents.
An alternate hierarchy would be documents contain spreads, spreads contain text frames, text frames contain text runs, text runs contain words.
This alternate hierarchy does not 'map' onto the first hierarchy: text frames do not map cleanly onto paragraph boundaries or vice versa.)
In this case, we're initially interested in getting the text, and we don't care too much about the lower level granules, so all we tell the disassembler is: nesting = document/text.story
.
This tells the disassembler: if you see a document granule, please disassemble it into text.story granules. Later on, we'll tell the disassembler to dig deeper than that.
The nesting
entry is actually a slash-separated list of class identifiers