Difference between revisions of "Assembler"

From DocDataFlow
Jump to: navigation, search
 
(11 intermediate revisions by one user not shown)
Line 1: Line 1:
 
An assembler is an [[Atomic adapter|''atomic adapter'']].  
 
An assembler is an [[Atomic adapter|''atomic adapter'']].  
  
Assemblers accept granules via their input. They then use these input granules to construct larger granules. Typically, assemblers will rely on the presence of certain 'trigger granules' in the input stream, to decide when a constructed granule is complete and ready to be released via the assembler's output.
+
Assemblers accept granules via their input connection.  
 +
 
 +
They then use some of these input granules to construct collated granules.  
 +
 
 +
Typically, assemblers will rely on the presence of certain 'trigger granules' in the input stream, to help them decide when they have all the necessary data needed to finish a constructed granule.
 +
 
 +
When the constructed granule is ready, it is released via the assembler's output connection.
 +
 
 +
Assemblers will often drop the smaller granules they used from the data flow, and only emit the newly constructed granules.
  
 
For example, an assembler could be collecting 'word granules', and string these 'word granules' together into some new 'word group' granule.  
 
For example, an assembler could be collecting 'word granules', and string these 'word granules' together into some new 'word group' granule.  
  
As time goes, the assembler needs to know when the 'word group' under construction is complete. The presence in the input stream of some other type granule (e.g. a 'text frame' granule) will typically be the trigger to release the newly constructed  'word group' granule, and get ready to construct the next 'word group' granule.
+
As time goes, the assembler needs to know when the 'word group' under construction is complete. The presence in the input stream of some other type granule (e.g. a 'text frame' granule or a 'paragraph' granule) will typically be the trigger to release the newly constructed  'word group' granule, and get ready to construct the next 'word group' granule.
  
In a typical Crawler workflow, a disassembler will only add to the data flow. It won't take granules away. In other words: the larger granules that are broken apart by [[Disassembler|''disassemblers'']] are not stripped away and remain part of the data flow.  
+
In a typical Crawler workflow, a [[Disassembler|''disassembler'']] will normally only add to the data flow. It won't take granules away. In other words: the larger granules that are broken apart by [[Disassembler|''disassemblers'']] are not stripped away and remain part of the data flow
 +
 
 +
Assemblers, on the other hand, do take granules away.
  
 
For example, when a [[Disassembler|''disassembler'']] breaks apart a 'paragraph' granule into a series of 'word' granules, the output of the disassembler will typically consist of a stream of word granules, followed by the original paragraph granule from which the word granules were extracted.  
 
For example, when a [[Disassembler|''disassembler'']] breaks apart a 'paragraph' granule into a series of 'word' granules, the output of the disassembler will typically consist of a stream of word granules, followed by the original paragraph granule from which the word granules were extracted.  
  
An assembler further down the track will often mostly ignore such paragraph granule as far as its contents go. Instead it will collect the word granules, and wait for the paragraph granule solely as a terminating trigger to signify the series of word granules is complete.
+
An assembler further down the data flow will ignore the content of such paragraph granules. Instead it will collect the word granules, and wait for the paragraph granule as a terminating trigger to signify the series of word granules is complete.
 +
 
 +
An example: consider the following data flow emitted by a [[Disassembler|''disassembler'']] further up the data flow:
 +
 
 +
<pre>
 +
Word: this
 +
Word: is
 +
Word: a
 +
Word: paragraph
 +
Para: this is a paragraph
 +
Word: this
 +
Word: is
 +
Word: another
 +
Word: paragraph
 +
Para: this is another paragraph
 +
TextFrame: pos (10, 20), width 20, height 80
 +
</pre>
 +
 
 +
An assembler might be set up to count the word granules, and emit a word count granule each time it is triggered by a paragraph granule.
 +
 
 +
This example assembler might convert the data flow into the following:
 +
 
 +
<pre>
 +
WordCount: 4
 +
Para: this is a paragraph
 +
WordCount: 5
 +
Para: this is another paragraph
 +
TextFrame: pos (10, 20), width 20, height 80
 +
</pre>
 +
 
 +
i.e. it has dropped the word granules, and emits a new 'word count' granule each time it 'sees' a paragraph granule pass by.

Latest revision as of 19:07, 29 December 2013

An assembler is an atomic adapter.

Assemblers accept granules via their input connection.

They then use some of these input granules to construct collated granules.

Typically, assemblers will rely on the presence of certain 'trigger granules' in the input stream, to help them decide when they have all the necessary data needed to finish a constructed granule.

When the constructed granule is ready, it is released via the assembler's output connection.

Assemblers will often drop the smaller granules they used from the data flow, and only emit the newly constructed granules.

For example, an assembler could be collecting 'word granules', and string these 'word granules' together into some new 'word group' granule.

As time goes, the assembler needs to know when the 'word group' under construction is complete. The presence in the input stream of some other type granule (e.g. a 'text frame' granule or a 'paragraph' granule) will typically be the trigger to release the newly constructed 'word group' granule, and get ready to construct the next 'word group' granule.

In a typical Crawler workflow, a disassembler will normally only add to the data flow. It won't take granules away. In other words: the larger granules that are broken apart by disassemblers are not stripped away and remain part of the data flow.

Assemblers, on the other hand, do take granules away.

For example, when a disassembler breaks apart a 'paragraph' granule into a series of 'word' granules, the output of the disassembler will typically consist of a stream of word granules, followed by the original paragraph granule from which the word granules were extracted.

An assembler further down the data flow will ignore the content of such paragraph granules. Instead it will collect the word granules, and wait for the paragraph granule as a terminating trigger to signify the series of word granules is complete.

An example: consider the following data flow emitted by a disassembler further up the data flow:

Word: this
Word: is
Word: a
Word: paragraph
Para: this is a paragraph
Word: this
Word: is
Word: another
Word: paragraph
Para: this is another paragraph
TextFrame: pos (10, 20), width 20, height 80

An assembler might be set up to count the word granules, and emit a word count granule each time it is triggered by a paragraph granule.

This example assembler might convert the data flow into the following:

WordCount: 4
Para: this is a paragraph
WordCount: 5
Para: this is another paragraph
TextFrame: pos (10, 20), width 20, height 80

i.e. it has dropped the word granules, and emits a new 'word count' granule each time it 'sees' a paragraph granule pass by.