Defining Transformations
For basic information about transformations, see Transformations.
For a brief overview of transformations, see Transformations Overview.
In this section, we are going to explain how to create transformations that change the data flowing through components. In particular:
-
What components must be used to apply transformations.
Components used in Transformation -
What language can be used to write transformations.
Java or CTL -
Whether definition can be internal or external.
Internal or External Definition -
What the return values of transformations are.
Return Values of Transformations -
What is the Transform editor and how to work with it.
Transform Editor -
What interfaces are common for most transformation-allowing components.
Common Java Interfaces
Components used in Transformation
Transformations can be defined in the following components:
-
DataGenerator, Map, and Rollup
These components require a transformation.
You can define the transformation in Java or in the Data Shaper transformation language.
In these components, different data records can be sent out through different output ports using the return values of the transformation.
In order to send different records to different output ports, you must both create some mapping of the record to the corresponding output port and return the corresponding integer value.
You can define the transformation in Java or in the Data Shaper transformation language.
In order to send different records to different output ports or Cluster nodes, you must return the corresponding integer value but no mapping needs to be written in this component since all records are sent out automatically. -
DataIntersection, Denormalizer, Normalizer, ExtHashJoin, ExtMergeJoin, LookupJoin, DBJoin and RelationalJoin
These components require a transformation.
You can define the transformation in Java or in the Data Shaper transformation language. -
CustomJavaReader
These components require a transformation.
You can only write it in Java.
Java or CTL
Transformations can be written in Java or in the Data Shaper transformation language (CTL):
-
Java can be used in all components.
Transformations executed in Java are faster than those written in CTL. Transformation can always be written in Java. -
CTL is a very simple scripting language that can be used in most of the transforming components. CTL can be used even without any prior knowledge of Java.
Internal or External Definition
Each transformation can be defined as internal or external:
-
Internal transformation:
An attribute like Transform, Denormalize, etc. must be defined.
In such a case, the piece of code is written directly in the graph and can be seen in it. -
External transformation:
One of the following two kinds of attributes may be defined:
- Transform URL, Denormalize URL, etc., for both Java and CTL.
The code is written in an external file. Also charset of such external file can be specified (Transform source charset, Denormalize source charset, etc.).
For transformations written in Java, a folder with transformation source code needs to be specified as source for Java compiler so that the transformation may be executed successfully.
- Transform class, Denormalize class, etc.
It is a compiled Java class.
The class must be in classpath so that the transformation may be executed successfully.
This is a brief overview:
-
Transform, Denormalize, etc.
To define a transformation in a graph itself, you must use the Transform editor. You can define a transformation located and visible in the graph itself. Transformation can be written in Java or CTL, as mentioned above.
For more detailed information about the editor or the dialog, see Transform Editor or Edit value dialog. -
Transform URL, Denormalize URL, etc.
You can also use a transformation defined in a source file outside a graph. To locate the transformation source file, use the URL file dialog. Each of the mentioned components can use this transformation definition. This file must contain the definition of the transformation written in either Java or CTL. In this case, transformation is located outside a graph.
For more detailed information see URL file dialog. -
Transform class, Denormalize class, etc.
In all transforming components, you can use some compiled transformation class. To do that, use the Open Type wizard. In this case, the transformation is located outside the graph.
For more detailed information, see Open type dialog.
More details about defining transformations can be found in the sections concerning corresponding components. Both transformation functions (required and optional) of CTL templates and Java interfaces are described there.
Find below a table with an overview of transformation-allowing components:
COMPONENT | TRANSFORMATION REQUIRED | JAVA | CTL | EACH TO ALL OUTPUTS ata | DIFFERENT TO DIFFERENT OUTPUTS "h | CTL TEMPLATE | JAVA INTERFACE |
---|---|---|---|---|---|---|---|
Readers | |||||||
DataGenerator | ✓ | ✓ | ✓ | x | ✓ | CTL Templates for DataGenerator | Java Interface |
MultiLevelReader | ✓ | ✓ | x | x | ✓ | - | Java Interfaces for MultiLevelReader |
Writers | |||||||
Transformers | |||||||
DataIntersection | ✓ | ✓ | ✓ | - | - | CTL Templates for DataIntersection | Java Interfaces for DataIntersection |
Map | ✓ | ✓ | ✓ | x | ✓ | CTL Templates for Map | Java Interfaces for Map |
Denormalizer | ✓ | ✓ | ✓ | - | - | CTL Templates | Java Interface |
Normalizer | ✓ | ✓ | ✓ | - | - | CTL Templates for Normalizer | Java Interface |
Rollup | ✓ | ✓ | ✓ | x | ✓ | CTL Templates for Rollup | Java Interface |
DataSampler | ✓ | x | x | - | - | - | - |
Joiners | |||||||
ExtHashJoin | ✓ | ✓ | ✓ | - | - | CTL Templates for Joiners | Java Interfaces for Joiners |
ExtMergeJoin | ✓ | ✓ | ✓ | - | - | CTL Templates for Joiners | Java Interfaces for Joiners |
LookupJoin | ✓ | ✓ | ✓ | - | - | CTL Templates for Joiners | Java Interfaces for Joiners |
DBJoin | ✓ | ✓ | ✓ | - | - | CTL Templates for Joiners | Java Interfaces for Joiners |
RelationalJoin | ✓ | ✓ | ✓ | - | - | CTL Templates for Joiners | Java Interfaces for Joiners |
[1] If this is yes
, each data record is always sent out through all connected output ports.
[2] If this is yes
, each data record can be sent out through the connected output port whose number is returned by the transformation. For more information, see Return Values of Transformations.
Return Values of Transformations
In components where transformations are defined, some return values may also be defined. These are integers greater than, equal to or less than 0.
Remember that DBExecute can also return integer values less than 0 in form of SQLExceptions.
-
Positive or zero return values
- ALL = Integer.MAX_VALUE
In this case, the record is sent out through all output ports. Remember that this variable does not need to be declared before it is used. In CTL,ALL
equals to2147483647
, in other words, it isInteger.MAX_VALUE
. BothALL
and2147483647
can be used.
- OK = 0
In this case, the record is sent out through the single output port or output port 0 (if the component has multiple output ports, e.g. Map, Rollup). Remember that this variable does not need to be declared before being used.
- Any other integer number greater than or equal to 0
In this case, the record is sent through the output port whose number is equal to this return value. These values can be called Mapping codes. -
Negative return values
- SKIP = - 1
This value is used to define that an error has occurred but the erroneous record will be skipped and the process will continue. Remember that this variable does not have to be declared before it is used. BothSKIP
and-1
can be used.
- STOP = - 2
This value is used to define that an error has occurred and the processing must be stopped. Remember that this variable does not have to be declared before it is used. BothSTOP
and-2
can be used.
- Any integer number less than or equal to -1
These values must be defined by user as described below. Their meaning is fatal error. These values can be called Error codes.
- Values greater than or equal to 0
Remember that all return values greater than or equal to 0 allow the same data record to be sent to the specified output ports only in the case of DataGenerator, Partition, Map and Rollup. Do not forget to define the mapping for each connected output port in DataGenerator, Map, and Rollup. In Partition (and clusterpartition), mapping is performed automatically. In the other components, this has no meaning. They have either a unique output port or their output ports are strictly defined for explicit outputs. - Values less than -1
Remember that you do not call corresponding optional OnError() function of CTL template using these return values. To call any optional OnError(), you may use, for example, the following function:
raiseError(string Arg)
.
It throws an exception which is able to call such OnError(), e.g. transformOnError(), etc. Any other exception thrown by any () function calls corresponding OnError(), if this is defined. - Values less than or equal to -2
Remember that if any of the functions that return integer values return a value less than or equal to -2 (including STOP), the getMessage() function is called (if defined).
Therefore, to allow this function to be called, one or more return statements with values less than or equal to -2 must be added to the functions that return integer. For example, if any of the functions such as transform(), append() or count(), etc. returns -2, getMessage() is called and the message is written to Console.
Warning!Remember that if the graph fails with an exception or returning any negative value less than -1, no records will be written to the output file.
If you want previously processed records to be written to the output, you must return SKIP (-1). In this way, those records will be skipped, the graph will not fail, and at least some records will be written to the output.
Updated about 1 year ago