Difference between revisions of "Context"

From DocDataFlow
Jump to: navigation, search
(Created page with "A context is a collection of data that is relevant for a particular ''granule''. For example, when a text frame in some document is represented by a Granule|''...")
 
(Context hierarchies)
 
(11 intermediate revisions by one user not shown)
Line 1: Line 1:
A context is a collection of data that is relevant for a particular [[Granule|''granule'']].  
+
=Context contents=
 +
 
 +
A context is a collection of data that is relevant to a particular [[Granule|''granule'']].  
  
 
For example, when a text frame in some document is represented by a [[Granule|''granule'']] for processing inside a Crawler personality, it is accompanied by its context.  
 
For example, when a text frame in some document is represented by a [[Granule|''granule'']] for processing inside a Crawler personality, it is accompanied by its context.  
Line 9: Line 11:
 
* ...
 
* ...
  
The data relating to the text frame is split into two parts: the granule itself, with its own raw data ‘inside’, and additional information about the granule and its surroundings in the context.  
+
The data relating to the text frame is split into two parts:  
 +
* the granule itself, with its own raw data ‘inside’
 +
* any additional information about the granule and its surroundings is stored in the context.  
  
The context contains all the other data that’s not part of the granule, but is relevant to it.
+
The context contains all the other data that is not part of the granule, but is relevant to it.
  
Contexts are arranged into a hierarchy.  
+
Once created, granules remain fixed and the data in them does not change. They often directly reflect properties and information extracted from the source document, and these remain constant.
  
When we look at a text frame granule, it will probably be a sub-granule of a larger ‘page’ granule. The page granule itself is a sub-granule of a larger ‘document’ granule.  
+
Contexts, on the other hand, are not fixed: as granules flow through various adapters, their context can accumulate additional data. It's normal for a granule to start out with an almost empty context. As it progresses through the various adapters, the context will collect more and more data, until the granule is either output or absorbed into a larger granule.
 +
 
 +
In a Crawler workflow, [[Adapter|''adapters'']] and [[Granule|''granules'']] are fixed, constant entities: once created they don't change. Any changes that accumulate during the process are tracked in a context.
 +
 
 +
=Context hierarchies=
 +
 
 +
Contexts are arranged into a hierarchy.
 +
 
 +
[[File:Context.png|800px]]
 +
 
 +
Example: when we look at a 'text frame' granule, it will probably be a sub-granule of a larger ‘page’ granule.  
 +
 
 +
The 'page' granule itself is a sub-granule of a larger ‘document’ granule.  
  
 
Each of those granules will have its own context. There will be a context for the document granule, and another context for the page granule.  
 
Each of those granules will have its own context. There will be a context for the document granule, and another context for the page granule.  
Line 30: Line 46:
  
 
If a certain placeholder is not defined within a particular context, Crawler will check the parent context, and the parent's parent and so on.
 
If a certain placeholder is not defined within a particular context, Crawler will check the parent context, and the parent's parent and so on.
 +
 +
There is a top-level context, the [[App Context|''app context'']]. This is a 'root context' which serves as the ultimate parent to all contexts that exist during the process. This [[App Context|''app context'']] stores system-wide information that is to be shared by all contexts.

Latest revision as of 01:37, 28 December 2013

Context contents

A context is a collection of data that is relevant to a particular granule.

For example, when a text frame in some document is represented by a granule for processing inside a Crawler personality, it is accompanied by its context.

That context will include information like:

  • what page is the text frame on?
  • what is the text frame position on that page?
  • what document is that page on?
  • ...

The data relating to the text frame is split into two parts:

  • the granule itself, with its own raw data ‘inside’
  • any additional information about the granule and its surroundings is stored in the context.

The context contains all the other data that is not part of the granule, but is relevant to it.

Once created, granules remain fixed and the data in them does not change. They often directly reflect properties and information extracted from the source document, and these remain constant.

Contexts, on the other hand, are not fixed: as granules flow through various adapters, their context can accumulate additional data. It's normal for a granule to start out with an almost empty context. As it progresses through the various adapters, the context will collect more and more data, until the granule is either output or absorbed into a larger granule.

In a Crawler workflow, adapters and granules are fixed, constant entities: once created they don't change. Any changes that accumulate during the process are tracked in a context.

Context hierarchies

Contexts are arranged into a hierarchy.

Context.png

Example: when we look at a 'text frame' granule, it will probably be a sub-granule of a larger ‘page’ granule.

The 'page' granule itself is a sub-granule of a larger ‘document’ granule.

Each of those granules will have its own context. There will be a context for the document granule, and another context for the page granule.

The page context will be a subcontext of the document context: i.e. the page context will include all info from the document context, plus its own specific data.

The text frame context will be a subcontext of the page context: i.e. the text frame context will include all info from the page context, plus its own specific data.

The various adapters in a workflow will often pass information to one another by means of the context.

During the Crawler process, we'll often refer to certain information by name. For example, when processing a template snippet the template text contains placeholders, like $$XPOS$$.

Such placeholders are interpreted within the relevant context. A single snippet will normally be used to process many individual granules; each of the granules will come with its own context, and placeholders like $$XPOS$$ will be replaced by different values every time, depending on what the context dictates for the value of XPOS.

If a certain placeholder is not defined within a particular context, Crawler will check the parent context, and the parent's parent and so on.

There is a top-level context, the app context. This is a 'root context' which serves as the ultimate parent to all contexts that exist during the process. This app context stores system-wide information that is to be shared by all contexts.