Primeur Online Docs
Data Shaper
Data Shaper
  • 🚀GETTING STARTED
    • What is Primeur Data Shaper
      • What is the Data Shaper Designer
      • What is the Data Shaper Server
      • What is the Data Shaper Cluster
    • How does the Data Shaper Designer work
      • Designer Views and Graphs
      • Data Shaper Graphs
      • Designer Projects and Sandboxes
      • Data Shaper Designer Reference
    • How do the Data Shaper Server and Cluster work
      • Data Shaper Server and Cluster
      • Data Shaper Server Reference
    • VFS Graph Components
      • DataOneFileDescriptor (DOFD) metadata
      • Passing parameters from Data One Contract to Data Shaper graph
      • Inheriting Data One context attributes in Data Shaper graph
  • DATA SHAPER DESIGNER
    • Configuration
      • Runtime configuration
        • Logging
        • Master Password
        • User Classpath
      • Data Shaper Server Integration
      • Execution monitoring
      • Java configuration
      • Engine configuration
      • Refresh Operation
    • Designer User Interface
      • Graph Editor with Palette of Components
      • Project Explorer Pane
      • Outline Pane
      • Tabs Pane
      • Execution Tab
      • Keyboard Shortcuts
    • Projects
      • Creating Data Shaper projects
      • Converting Data Shaper projects
      • Structure of Data Shaper projects
      • Versioning of server project content
      • Working with Data Shaper Server Projects
      • Project configuration
    • Graphs
      • Creating an empty graph
      • Creating a simple graph
        • Placing Components
        • Placing Components from Palette
        • Connecting Components with Edges
    • Execution
      • Successful Graph Execution
      • Run configuration
      • Connecting to a running job
      • Graph states
    • Common dialogs
      • URL file dialog
      • Edit Value dialog
      • Open Type dialog
    • Import
      • Import Data Shaper projects
      • Import from Data Shaper server sandbox
      • Import graphs
      • Import metadata
    • Export
      • Export graphs to HTML
      • Export to Data Shaper Server sandbox
      • Export image
      • Export Project as Library
    • Graph tracking
      • Changing record count font size
    • Search functionality
    • Working with Data Shaper server
      • Data Shaper server project basic principles
      • Connecting via HTTP
      • Connecting via HTTPS
      • Connecting via Proxy Server
    • Graph components
      • Adding components
      • Finding components
      • Edit component dialog box
      • Enable/disable component
      • Passing data through disabled component
      • Common properties of components
      • Specific attribute types
      • Metadata templates
    • Edges
      • Connecting Components with Edges
      • Types of Edges
      • Assigning Metadata to Edges
      • Colors of Edges
      • Debugging Edges
      • Edge Memory Allocation
    • Metadata
      • Records and Fields
        • Record Types
        • Data Types in Metadata
        • Data Formats
        • Locale and Locale Sensitivity
        • Time Zone
        • Autofilling Functions
      • Metadata Types
        • Internal Metadata
        • External (Shared) Metadata
        • SQL Query Metadata
        • Reading Metadata from Special Sources
      • Auto-propagated Metadata
        • Sources of Auto-Propagated Metadata
        • Explicitly Propagated Metadata
        • Priorities of Metadata
        • Propagation of SQL Query Metadata
      • Creating Metadata
        • Extracting Metadata from a Flat File
        • Extracting Metadata from an XLS(X) File
        • Extracting Metadata from a Database
        • Extracting Metadata from a DBase File
        • Extracting Metadata from Salesforce
        • SQL Query Metadata
        • User Defined Metadata
      • Merging Existing Metadata
      • Creating Database Table from Metadata and Database Connection
      • Metadata Editor
        • Opening Metadata Editor
        • Basics of Metadata Editor
        • Record Pane
        • Field Name vs. Label vs. Description
        • Details Pane
      • Changing and Defining Delimiters
      • Editing Metadata in the Source Code
      • Multi-value Fields
        • Lists and Maps Support in Components
        • Joining on multivalue fields (Comparison Rules)
    • Connections
      • Database Connections
        • Internal Database Connections
        • External (Shared) Database Connections
        • Database Connections Properties
        • Encryption of Access Password
        • Browsing Database and Extracting Metadata from Database Tables
        • Windows Authentication on Microsoft SQL Server
        • Snowflake Connection
        • Hive Connection
        • Troubleshooting
      • JMS Connections
      • QuickBase Connections
      • Hadoop Connections
      • Kafka Connections
      • OAuth2 Connections
      • MongoDB Connections
      • Salesforce Connections
    • Lookup Tables
      • Lookup Tables in Cluster Environment
      • Internal Lookup Tables
      • External (Shared) Lookup Tables
      • Types of Lookup Tables
    • Sequences
      • Persistent Sequences
      • Non Persistent Sequences
      • Internal Sequences
      • External (Shared) Sequences
      • Editing a Sequence
      • Sequences in Cluster Environment
    • Parameters
      • Internal Parameters
      • External (Shared) Parameters
      • Secure Graph Parameters
      • Graph Parameter Editor
      • Secure Graph Parameters
      • Parameters with CTL2 Expressions (Dynamic Parameters)
      • Environment Variables
      • Canonicalizing File Paths
      • Using Parameters
    • Internal/External Graph Elements
    • Dictionary
      • Creating a Dictionary
      • Using a Dictionary in Graphs
    • Execution Properties
    • Notes in Graphs
      • Placing Notes into Graph
      • Resizing Notes
      • Editing Notes
      • Formatted Text
      • Links from Notes
      • Folding Notes
      • Notes Properties
    • Transformations
      • Defining Transformations
      • Transform Editor
      • Common Java Interfaces
    • Data Partitioning (Parallel Running)
    • Data Partitioning in Cluster
      • High Availability
      • Scalability
      • Graph Allocation Examples
      • Example of Distributed Execution
      • Remote Edges
    • Readers
      • Common Properties of Readers
      • ComplexDataReader
      • DatabaseReader
      • DataGenerator
      • DataOneVFSReader
      • EDIFACTReader
      • FlatFileReader
      • JSONExtract
      • JSONReader
      • LDAPReader
      • MultiLevelReader
      • SpreadsheetDataReader
      • UniversalDataReader
      • X12Reader
      • XMLExtract
      • XMLReader
      • XMLXPathReader
    • Writers
      • Common Properties of Writers
      • DatabaseWriter
      • DataOneVFSWriter
      • EDIFACTWriter
      • FlatFileWriter
      • JSONWriter
      • LDAPWriter
      • SpreadsheetDataWriter
      • HIDDEN StructuredDataWriter
      • HIDDEN TableauWriter
      • Trash
      • UniversalDataWriter
      • X12Writer
      • XMLWriter
    • Transformers
      • Common Properties of Transformers
      • Aggregate
      • Concatenate
      • DataIntersection
      • DataSampler
      • Dedup
      • Denormalizer
      • ExtSort
      • FastSort
      • Filter
      • Map
      • Merge
      • MetaPivot
      • Normalizer
      • Partition
      • Pivot
      • Rollup
      • SimpleCopy
      • SimpleGather
      • SortWithinGroups
      • XSLTransformer
    • Joiners
      • Common Properties of Joiners
      • Combine
      • CrossJoin
      • DBJoin
      • ExtHashJoin
      • ExtMergeJoin
      • LookupJoin
      • RelationalJoin
    • Others
      • Common Properties of Others
      • CheckForeignKey
      • DBExecute
      • HTTPConnector
      • LookupTableReaderWriter
      • WebServiceClient
    • CTL2 - Data Shaper Transformation Language
    • Language Reference
      • Program Structure
      • Comments
      • Import
      • Data Types in CTL2
      • Literals
      • Variables
      • Dictionary in CTL2
      • Operators
      • Simple Statement and Block of Statements
      • Control Statements
      • Error Handling
      • Functions
      • Conditional Fail Expression
      • Accessing Data Records and Fields
      • Mapping
      • Parameters
      • Regular Expressions
    • CTL Debugging
      • Debug Perspective
      • Importing and Exporting Breakpoints
      • Inspecting Variables and Expressions
      • Examples
    • Functions Reference
      • Conversion Functions
      • Date Functions
      • Mathematical Functions
      • String Functions
      • Mapping Functions
      • Container Functions
      • Record Functions (Dynamic Field Access)
      • Miscellaneous Functions
      • Lookup Table Functions
      • Sequence Functions
      • Data Service HTTP Library Functions
      • Custom CTL Functions
      • CTL2 Appendix - List of National-specific Characters
      • HIDDEN Subgraph Functions
    • Tutorial
      • Creating a Transformation Graph
      • Filtering the records
      • Sorting the Records
      • Processing Speed-up with Parallelization
      • Debugging the Java Transformation
  • DATA SHAPER SERVER
    • Introduction
    • Administration
      • Monitoring
    • Using Graphs
      • Job Queue
      • Execution History
      • Job Inspector
    • Cluster
      • Sandboxes in Cluster
      • Troubleshooting
  • Install Data Shaper
    • Install Data Shaper
      • Introduction to Data Shaper installation process
      • Planning Data Shaper installation
      • Data Shaper System Requirements
      • Data Shaper Domain Master Configuration reference
      • Performing Data Shaper initial installation and master configuration
        • Creating database objects for PostgreSQL
        • Creating database objects for Oracle
        • Executing Data Shaper installer
        • Configuring additional firewall rules for Data Shaper
Powered by GitBook
On this page
  • Short Description
  • Ports
  • Metadata
  • Denormalizer Attributes
  • Details
  • Key
  • Sort order
  • CTL Interface
  • CTL Templates
  • Access to input and output fields
  • Java Interface
  • Examples
  • Best Practices
  • See also
  1. DATA SHAPER DESIGNER
  2. Transformers

Denormalizer

Short Description

Denormalizer creates a single output record from one or more input records. Input records should be sorted.

COMPONENT
SAME INPUT METADATA
SORTED INPUTS
INPUTS
OUTPUTS
JAVA
CTL
AUTO-PROPAGATED METADATA

Denormalizer

-

x

1

0-1

✓

✓

x

Ports

PORT TYPE
NUMBER
REQUIRED
DESCRIPTION
METADATA

Input

0

✓

For input data records

Any

Output

0

✓

For denormalized data records

Any

Metadata

Denormalizer does not propagate metadata. Denormalizer does not have metadata templates. Denormalizer does not require any specific metadata fields.

Denormalizer Attributes

ATTRIBUTE

REQ

DESCRIPTION

POSSIBLE VALUES

BASIC

Key

[1]

A key that creates groups of input data records according to its value. Adjacent input records with the same value of Key are considered to be members of one group. One output record is composed from members of such group. For more information, see Key below.

Group size

[1]

A group may be defined by exact number of its members. E.g. each five records form a single group. The input record count must be a multiple of group size (see Allow incomplete last group). This is mutually exclusive with the key attribute.

a number

Denormalize

[2]

Definition of how to denormalize records, written in the graph in CTL or Java.

Denormalize URL

[2]

The name of an external file, including the path, containing the definition of how to denormalize records, written in CTL or Java.

Denormalize class

[2]

The name of an external class defining how records should be normalized.

Equal NULL

By default, records with null values of key fields are considered to be equal. If false, they are considered to be different.

true (default) | false

Denormalize source charset

Encoding of the external file defining the transformation. The default encoding depends on DEFAULT_SOURCE_CODE_CHARSET in defaultProperties.

E.g. UTF-8

ADVANCED

Allow incomplete last group

In case input records grouping is specified by the Group size attribute, the number of input records must be a multiple of group size. Using this attribute, this condition can be suppressed. The last group does not need to be complete.

true | false (default)

DEPRECATED

Sort order

Order in which groups of input records are expected to be sorted. See Sort order below.

Auto (default) | Ascending | Descending | Ignore

Error actions

The definition of an action that should be performed when the specified transformation returns some Error code. See Return Values of Transformations.

Error log

A URL of the file to which error messages for specified Error actions should be written. If not set, they are written to Console.

[1] group size has higher priority than key. If neither of these attributes is specified, all records will form a single group. [2] One of them must specified.

Details

Denormalizer receives sorted data through a single input port, checks Key values and creates one output record from one or more adjacent input records with the same Key value.

Denormalizer requires transformation. The transformation can be defined in CTL (see CTL Interface below) or in Java (see Java Interface below) or using existing .class file (Denormalize class attribute).

To define transformation, use one of the three transformation attributes: Denormalize, Denormalize URL or Denormalize class.

Diagram below describes flow of function calls in Denormalizer.

The function append() is called once for each input record. The function transform() is called once for each group of input records.

If you do not define any of the optional functions init(), preExecute(), clean() or postExecute(), the execution flow continues with the next function according to the diagram.

If you do not specify the appendOnError() or transformOnError() functions and an error occurs, the execution of graph fails.

The transformation uses a CTL template for Denormalizer, implements a RecordDenormalize interface or inherits from a DataRecordDenormalize superclass. The interface methods are listed in CTL Interface and Java Interface below.

Key

Key is expressed as a sequence of field names separated from each other by a semicolon, colon, or pipe.

Example 40. Key for Denormalizer

first_name;last_name

In this Key, first_name and last_name are fields of metadata on input port.

Sort order

If the records are denormalized by the Key, i.e. not by the Group size, the input records must be grouped according to the Key field value. Then, depending on the sorting order of the groups, select the proper Sort order:

  • Auto - the sorting order of the groups of input records is guessed from the first two records with different value in the key field, i.e. from the first records of the first two groups.

  • Ascending - if the groups of input records with the same key field value(s) are sorted in ascending order.

  • Descending - if the groups of input records with the same key field value(s) are sorted in descending order.

  • Ignore - if the groups of input records with the same key field value(s) are not sorted.

CTL Interface

The transformation written in CTL uses a CTL template for Denormalizer. Only the functions append() and transform() are mandatory. Once you have written your transformation, you can also convert it to Java language code by clicking the corresponding button at the upper right corner of the tab. You can open the transformation definition as another tab of the graph (in addition to the Graph and Source tabs of Graph Editor) by clicking the corresponding button at the upper right corner of the tab.

CTL Templates

CTL TEMPLATE FUNCTIONS

boolean init()

Required

No

Description

Initializes the component, sets up the environment and global variables

Invocation

Called before processing the first record

Returns

true | false if false, graph fails)

CTL TEMPLATE FUNCTIONS

integer append()

Required

Yes

Input Parameters

None

Returns

Integer numbers. Negative value lower than -1 aborts processing. Any non-negative value means a successful pass.

Invocation

Called repeatedly, once for each input record

Description

For the group of adjacent input records with the same Key values, it appends the information from which the resulting output record is composed. If append() fails and the user has not defined any appendOnError(), the whole graph will fail. If any of the input records causes fail of the append() function, and if the user has defined appendOnError() function, processing continues in this appendOnError() at the place where append() failed. The append() passes to the appendOnError() error message and stack trace as arguments.

Example

See here below

function integer append() {
    CustomersInGroup++;
    myLength = length(errorCustomers);
    if(!isInteger($in.0.OneCustomer)) {
        errorCustomers = errorCustomers + iif(myLength > 0 ," - ","") + $in.0.OneCustomer;
    }
    customers = customers + iif(length(customers) > 0 ," - ","") + $in.0.OneCustomer;
    groupNo = $in.0.GroupNo;
    return OK;
}

CTL TEMPLATE FUNCTIONS

integer transform()

Required

Yes

Input Parameters

None

Returns

Integer numbers. For detailed information, see Return Values of Transformations.

Invocation

Called repeatedly, once for each output record.

Description

It creates output records. If transform() fails and the user has not defined any transformOnError(), the whole graph will fail. If any part of the transform() function for some output record causes fail of the transform() function, and if the user has defined the transformOnError() function, processing continues in the transformOnError() at the place where transform() failed. The transformOnError() function gets the information gathered by transform() that was get from previously successfully processed code. Also error message and stack trace are passed to transformOnError().

Example

See here below

function integer transform() {
    $out.0.CustomersInGroup = CustomersInGroup;
    $out.0.CustomersOnError = errorCustomers;
    $out.0.Customers = customers;
    $out.0.GroupNo = groupNo;
    return OK;
}

CTL TEMPLATE FUNCTIONS

void clean()

Required

No

Input Parameters

None

Returns

Void

Invocation

Called repeatedly, once for each output record. The clean() function is called after the transform() function.

Description

Returns the component to the initial settings.

Example

See here below

function void clean(){
    customers = "";
    errorCustomers = "";
    groupNo = 0;
    CustomersInGroup = 0;
}

CTL TEMPLATE FUNCTIONS

integer appendOnError(string errorMessage, string stackTrace)

Required

No

Input Parameters

string errorMessage string stackTrace

Returns

Integer numbers. Positive integer numbers are ignored, meaning of 0 and negative values is described in Return Values of Transformations.

Invocation

Called if append() throws an exception.

Description

The function handles errors which occurred in the append() function. If any of the input records causes fail of the append() function, and if the user has defined the appendOnError() function, processing continues in this appendOnError() at the place where append() failed. The appendOnError() function gets the information gathered by append() that was get from previously successfully processed input records. The error message and stack trace are passed to appendOnError(), as well.

Example

See here below

function integer appendOnError(
                  string errorMessage,
                  string stackTrace) {
    printErr(errorMessage);
    return CustomersInGroup;
}

CTL TEMPLATE FUNCTIONS

integer transformOnError(Exception exception, stackTrace)

Required

No

Input Parameters

string errorMessage string stackTrace

Returns

Invocation

Called if transform() throws an exception.

Description

The function handles errors which occurred in transform() function. If any part of the transform() function fails, and if the user has defined the transformOnError() function, processing continues in the transformOnError() at the place where transform() failed. The transformOnError() function gets the information gathered by transform() that was get from previously successfully processed code. The error message and stack trace are passed to transformOnError(), as well. The function transformOnError() creates output records.

Example

See here below

function integer transformOnError(
                  string errorMessage,
                  string stackTrace) {
    $out.0.CustomersInGroup = CustomersInGroup;
    $out.0.ErrorFieldForTransform = errorCustomers;
    $out.0.CustomersOnError = errorCustomers;
    $out.0.Customers = customers;
    $out.0.GroupNo = groupNo;
    return OK;
}

CTL TEMPLATE FUNCTIONS

string getMessage()

Required

No

Description

Prints the error message specified and invoked by the user.

Invocation

Called in any time specified by the user (called only when either append(), transform(), appendOnError() or transformOnError() returns value less than or equal to -2).

Returns

string

CTL TEMPLATE FUNCTIONS

void preExecute()

Required

No

Input Parameters

None

Returns

void

Description

May be used to allocate and initialize resources required by the transform. All resources allocated within this function should be released by the postExecute() function.

Invocation

Called during each graph run before the transform is executed.

CTL TEMPLATE FUNCTIONS

void postExecute()

Required

No

Input Parameters

None

Returns

void

Description

Should be used to free any resources allocated within the preExecute() function.

Invocation

Called during each graph run after the entire transform was executed.

Access to input and output fields

Input records or fields Input records or fields are accessible within the append() and appendOnError() functions only.

Output records or fields Output records or fields are accessible within the transform() and transformOnError() functions only.

Warning: All of the other CTL template functions allow to access neither inputs nor outputs. Remember that if you do not hold these rules, NPE will be thrown.

Java Interface

  • boolean init(Properties parameters, DataRecordMetadata sourceMetadata, DataRecordMetadata targetMetadata) Initializes denormalize class/function. This method is called only once at the beginning of denormalization process. Any object allocation/initialization should happen here.

  • int append(DataRecord inRecord) Passes one input record to the composing class.

  • int appendOnError(Exception exception, DataRecord inRecord) Passes one input record to the composing class. Called only if append(DataRecord) throws an exception.

  • int transform(DataRecord outRecord) Retrieves composed output record. For detailed information about return values and their meaning, see Return Values of Transformations. In Denormalizer, only ALL, 0, SKIP, and Error codes have some meaning.

  • int transformOnError(Exception exception, DataRecord outRecord) Retrieves composed output record. Called only if transform(DataRecord) throws an exception.

  • void clean() Finalizes current round/clean after current round. Called after the transform method was called for the input record.

Examples

Converting multiple having same key records to one Input records acquired from relational database contain fields companyName and product.

Denormalizer Limited |chocolate
Denormalizer Limited |coffee
Denormalizer Limited |pizza
ZXCV International   |coffee

Convert the records to following form: **companyName **is followed by list of products separated by commas.

Solution Use the Key and Normalize attributes.

ATTRIBUTE
VALUE

Key

companyName

Normalize

See the code below

//#CTL2

string[] products;
string companyName;

function integer append() {
    append(products, $in.0.product);
    companyName = $in.0.companyName;
    return OK;
}

function integer transform() {
    $out.0.companyName = companyName;
    $out.0.products = join(",", products);
    return OK;
}

function void clean() {
    clear(products);
}

Denormalizer returns following records:

Denormalizer Limited |chocolate,coffee,pizza
ZXCV International   |coffee

Note: Records with the same Key have to be available in input data all at once. Otherwise you will get a new output record for each several subsequent records having the same key. The best solution is to have input records sorted by Key.

Converting fixed number of records to one records Given a list of students.

Charlie
Daniel
Agatha
Henry
Oscar
Kate
Romeo
Jane

Convert the list to groups of 3. Each group (one output record) has a number and names of its members. The names are separated by comma.

Each output record contains groupNumber and members.

Solution Use the Group size and Normalize attributes. To be able to process the number of record not being divisible by 3, you need the Allow incomplete last group attribute.

ATTRIBUTE
VALUE

Group size

3

Normalize

See the code below

Allow incomplete last group

true

//#CTL2

integer groupNumber;
string[] names;

function integer append() {
    append(names, $in.0.name);
    return OK;
}

function integer transform() {
    $out.0.groupNo = groupNumber;
    $out.0.members = join(",", names);
    groupNumber++;

    return OK;
}

function boolean init() {
    groupNumber = 1;
    return true;
}

function void clean() {
    clear(names);
}

Denormalizer returns following records:

1|Charlie,Daniel,Agatha
2|Henry,Oscar,Kate
3|Romeo,Jane

Best Practices

If the transformation is specified in an external file (with Denormalize URL), we recommend users to explicitly specify Denormalize source charset.

See also

PreviousDedupNextExtSort

Last updated 1 month ago

Integer numbers. For detailed information, see .

The transformation implements methods of the RecordDenormalize interface and inherits other common methods from the Transform interface. See . Following are the methods of the RecordDenormalize interface:

Common Java Interfaces
Normalizer
Rollup
Common Properties of Components
Specific attribute types
Common Properties of Transformers
Return Values of Transformations