Partition
Short Description
Partition distributes individual input data records among different output ports.
COMPONENT | SAME INPUT METADATA | SORTED INPUTS | INPUTS | OUTPUTS | JAVA | CTL | AUTO-PROPAGATED METADATA |
---|---|---|---|---|---|---|---|
Partition | - | x | 1 | 1-n | [1] | [1] | ✓ |
[1] Partition can use either a transformation or two other attributes (Ranges and/or Partition key).
Ports
PORT TYPE | NUMBER | REQUIRED | DESCRIPTION | METADATA |
---|---|---|---|---|
Input | 0 | ✓ | For input data records | Any |
Output | 0 | ✓ | For output data records | Input 0 |
1-N | x | For output data records | Input 0 |
Metadata
Partition propagates metadata in both directions. Partition does not change priority of propagated metadata.
Partition has no metadata template.
Input and output fields can have any data types.
Metadata on input and output ports cannot differ. (Input and output records can have different names but the metadata fields of both records must be identical.)
Partition Attributes
ATTRIBUTE | REQ. | DESCRIPTION | POSSIBLE VALUES |
---|---|---|---|
BASIC | |||
Partition | [1] | Definition of the way how records should be distributed among output ports written in the graph in CTL or Java. | |
Partition URL | [1] | The name of the external file, including the path, containing the definition of the way how records should be distributed among output ports written in CTL or Java. | |
Partition class | [1] | The name of the external class defining the way how records should be distributed among output ports. | |
Ranges | [1] [2] | Ranges expressed as a sequence of individual ranges separated from each other by a semicolon. Each individual range is a sequence of intervals for some set of fields that are adjacent to each other without any delimiter. It is expressed also whether the minimum and maximum margin is included in the interval or not by a bracket and parenthesis, respectively. Example of Ranges: <1,9)(,31.12.2008);<1,9)<31.12.2008,);<9,)(,31.12.2008);<9,)<31.12.2008) | |
Partition key | [1] [2] | Key according to which input records are distributed among different output ports. Expressed as a sequence of individual input field names separated from each other by a semicolon. Example of Partition key: first_name;last_name . | |
ADVANCED | |||
Partition source charset | Encoding of the external file defining the transformation. The default encoding depends on DEFAULT_SOURCE_CODE_CHARSET in defaultProperties. | UTF-8 | other encoding | |
DEPRECATED | |||
Locale | Locale to be used when internationalization is set to true . By default, system value is used unless the value of Locale specified in the defaultProperties file is uncommented and set to the desired Locale. For more information on how Locale may be changed in the defaultProperties , see Engine configuration. | system value or specified default value (default) | other locale | |
Use internationalization | By default, no internationalization is used. If set to true , sorting according to national properties is performed. | false (default) | true |
[1] If one of these transformation attributes is specified, both Ranges and Partition key will be ignored since they have lesser priority.
[2] If no transformation attribute is defined, Ranges and Partition key are used in one of the three ways as described in details.
Details
To distribute data records, user-defined transformation, ranges of Partition key or RoundRobin algorithm may be used. In this component, no mapping may be defined since it does not change input data records. It only distributes them unchanged among output ports.
Transformation uses a CTL template for Partition or implements a PartitionFunction
interface. Its methods are listed below.
If no transformation attribute is defined, Ranges and Partition key are used in one of following ways:
- Both Ranges and Partition key are set.
The records in which the values of the fields are inside the margins of specified range will be sent to the same output port. The number of the output port corresponds to the order of the range within all values of the fields. - Ranges are not defined. Only Partition key is set.
Records will be distributed among output ports in such a way that all records with the same values of Partition key fields will be sent to the same port.
The output port number will be determined as the hash value computed from the key fields modulo the number of output ports. - Neither Ranges nor Partition key are defined.
RoundRobin algorithm will be used to distribute records among output ports.
Hint!
Note that you can use the Partition component as a filter similarly to Filter. With the Partition component, you can define much more sophisticated filter expressions and distribute input data records among more than 2 outputs. Neither Partition nor Filter allow to modify records.
Partition is a high-performance component, thus you cannot modify input and output records - it would result in an error. If you need to do so, consider using Map instead.
Updated about 1 year ago