Granule

From DocDataFlow
Revision as of 00:58, 30 December 2013 by Kris (Talk | contribs)

Jump to: navigation, search

Overview

Granules are the 'chunks of data' that flow through the network of adapters defined by the personality.

A granule can represent any quantity of data. It could represent anything from a single bit to a complete database with all its content, or an even larger clump of data.

When granules flow through a network of adapters they are often split into smaller granules by disassemblers.

Smaller granules are often collated into larger granules by assemblers.

Predefined Base Granule Types

It is impossible to predefine all possible kinds of granule types that could be handled by Crawler.

As new document formats are added to the system, new granule types will need to be introduced to correctly capture the document data inside those as-of-yet unsupported document types.

This is accepted and expected in the Crawler system: document-type specific disassemblers are allowed to add new granule types to the system.

When adding new granule types, care must be taken to relate the new granule types back to one of the predefined base granules whenever possible.

So, if a document type XYZ has a concept of a 'paragraph', the document support might introduce a new granule type 'XYZ_ParagraphGranule'. This XYZ_ParagraphGranule should then be a more specialized version of the predefined ParagraphGranule. In other words, XYZ_ParagraphGranule should have all the features of ParagraphGranule, plus some XYZ-specific features.

Some of the base granule types below will not make sense for some/most document types; in that case, they should simply be ignored.

AppGranule

ColorGranule

DocumentGranule

FontGranule

FrameGranule

PageGranule

SpreadGranule

StyleGranule

Specialized Base Granule Types

FrameGranule

GraphicFrameGranule

TextFrameGranule

StyleGranule

CharacterStyleGranule

ParagraphStyleGranule

TextGranule

ParagraphGranule

StoryGranule

TextRunGranule

WordGranule