Difference between revisions of "Overview"
Line 26: | Line 26: | ||
When processing, input document(s) are pushed through the network of [[Adapter|''adapters'']] provided by the personality; data is flowing in and out of the ''adapters''. | When processing, input document(s) are pushed through the network of [[Adapter|''adapters'']] provided by the personality; data is flowing in and out of the ''adapters''. | ||
− | [[File:Sampleexporter.png]] | + | [[File:Sampleexporter.png|400px]] |
If the ''personality'' were a hive, then the ''adapters'' would be the worker bees. | If the ''personality'' were a hive, then the ''adapters'' would be the worker bees. |
Revision as of 20:28, 26 December 2013
Overview
Crawler is designed along principles that are similar to the ones found in the [Data Flow Programming] paradigm.
Crawler all by itself does not perform any useful function. In order to become usable it needs to be extended with a personality. The personality determines what function Crawler will perform.
Personality
One of the high-level components in a Crawler-based system is called a personality.
A personality is a high-level Crawler component which will take input data in some shape or form, and will process it into output data in some other form.
A few examples:
- InDesign-to-XHTML/CSS: takes in InDesign documents or books and outputs XHTML/CSS files.
- InDesign-to-EPUB: takes in InDesign documents or books, outputs EPUB.
- InDesign-to-Database input: takes in InDesign document or books, and updates a database with information extracted from the document(s).
Personalities are made up out of simpler elements.
A personality is composed of:
- a workflow network of interconnected processing units called adapters
- a set of configuration files
- a set of template files
- a set of formula files
When processing, input document(s) are pushed through the network of adapters provided by the personality; data is flowing in and out of the adapters.
If the personality were a hive, then the adapters would be the worker bees.
A personality is somewhat reminiscent of a Rube Goldberg-machine.
The initial adapters process the document, and take it apart into ever smaller chunks of data, or collate smaller chunks back into larger chunks. These chunks of data are called granules.
More adapters take in larger granules and split them up into smaller granules (e.g. they might take in a paragraph granule and split it into individual word granules).
Specific adapters collate smaller granules back into larger granules (e.g. they might take a number of word granules and concatenate them back into a paragraph granules).
Some adapters perform some kind of processing on the granules they receive; they might change them in some way, discard them, count them, reorder them, create new granules based on previous granules...
Other adapters construct new granules based on template snippets. For example, some adapter could take in some raw text, and combine this raw text with a template snippet into some XML formatted granule.
The general idea is that the input data is broken apart into smaller entities, and then these smaller entities are put back together again a different shape, possibly performing a document conversion in the process.