Primeur Online Docs
Data Shaper
Data Shaper
  • 🚀GETTING STARTED
    • What is Primeur Data Shaper
      • What is the Data Shaper Designer
      • What is the Data Shaper Server
      • What is the Data Shaper Cluster
    • How does the Data Shaper Designer work
      • Designer Views and Graphs
      • Data Shaper Graphs
      • Designer Projects and Sandboxes
      • Data Shaper Designer Reference
    • How do the Data Shaper Server and Cluster work
      • Data Shaper Server and Cluster
      • Data Shaper Server Reference
    • VFS Graph Components
      • DataOneFileDescriptor (DOFD) metadata
      • Passing parameters from Data One Contract to Data Shaper graph
      • Inheriting Data One context attributes in Data Shaper graph
  • DATA SHAPER DESIGNER
    • Configuration
      • Runtime configuration
        • Logging
        • Master Password
        • User Classpath
      • Data Shaper Server Integration
      • Execution monitoring
      • Java configuration
      • Engine configuration
      • Refresh Operation
    • Designer User Interface
      • Graph Editor with Palette of Components
      • Project Explorer Pane
      • Outline Pane
      • Tabs Pane
      • Execution Tab
      • Keyboard Shortcuts
    • Projects
      • Creating Data Shaper projects
      • Converting Data Shaper projects
      • Structure of Data Shaper projects
      • Versioning of server project content
      • Working with Data Shaper Server Projects
      • Project configuration
    • Graphs
      • Creating an empty graph
      • Creating a simple graph
        • Placing Components
        • Placing Components from Palette
        • Connecting Components with Edges
    • Execution
      • Successful Graph Execution
      • Run configuration
      • Connecting to a running job
      • Graph states
    • Common dialogs
      • URL file dialog
      • Edit Value dialog
      • Open Type dialog
    • Import
      • Import Data Shaper projects
      • Import from Data Shaper server sandbox
      • Import graphs
      • Import metadata
    • Export
      • Export graphs to HTML
      • Export to Data Shaper Server sandbox
      • Export image
      • Export Project as Library
    • Graph tracking
      • Changing record count font size
    • Search functionality
    • Working with Data Shaper server
      • Data Shaper server project basic principles
      • Connecting via HTTP
      • Connecting via HTTPS
      • Connecting via Proxy Server
    • Graph components
      • Adding components
      • Finding components
      • Edit component dialog box
      • Enable/disable component
      • Passing data through disabled component
      • Common properties of components
      • Specific attribute types
      • Metadata templates
    • Edges
      • Connecting Components with Edges
      • Types of Edges
      • Assigning Metadata to Edges
      • Colors of Edges
      • Debugging Edges
      • Edge Memory Allocation
    • Metadata
      • Records and Fields
        • Record Types
        • Data Types in Metadata
        • Data Formats
        • Locale and Locale Sensitivity
        • Time Zone
        • Autofilling Functions
      • Metadata Types
        • Internal Metadata
        • External (Shared) Metadata
        • SQL Query Metadata
        • Reading Metadata from Special Sources
      • Auto-propagated Metadata
        • Sources of Auto-Propagated Metadata
        • Explicitly Propagated Metadata
        • Priorities of Metadata
        • Propagation of SQL Query Metadata
      • Creating Metadata
        • Extracting Metadata from a Flat File
        • Extracting Metadata from an XLS(X) File
        • Extracting Metadata from a Database
        • Extracting Metadata from a DBase File
        • Extracting Metadata from Salesforce
        • SQL Query Metadata
        • User Defined Metadata
      • Merging Existing Metadata
      • Creating Database Table from Metadata and Database Connection
      • Metadata Editor
        • Opening Metadata Editor
        • Basics of Metadata Editor
        • Record Pane
        • Field Name vs. Label vs. Description
        • Details Pane
      • Changing and Defining Delimiters
      • Editing Metadata in the Source Code
      • Multi-value Fields
        • Lists and Maps Support in Components
        • Joining on multivalue fields (Comparison Rules)
    • Connections
      • Database Connections
        • Internal Database Connections
        • External (Shared) Database Connections
        • Database Connections Properties
        • Encryption of Access Password
        • Browsing Database and Extracting Metadata from Database Tables
        • Windows Authentication on Microsoft SQL Server
        • Snowflake Connection
        • Hive Connection
        • Troubleshooting
      • JMS Connections
      • QuickBase Connections
      • Hadoop Connections
      • Kafka Connections
      • OAuth2 Connections
      • MongoDB Connections
      • Salesforce Connections
    • Lookup Tables
      • Lookup Tables in Cluster Environment
      • Internal Lookup Tables
      • External (Shared) Lookup Tables
      • Types of Lookup Tables
    • Sequences
      • Persistent Sequences
      • Non Persistent Sequences
      • Internal Sequences
      • External (Shared) Sequences
      • Editing a Sequence
      • Sequences in Cluster Environment
    • Parameters
      • Internal Parameters
      • External (Shared) Parameters
      • Secure Graph Parameters
      • Graph Parameter Editor
      • Secure Graph Parameters
      • Parameters with CTL2 Expressions (Dynamic Parameters)
      • Environment Variables
      • Canonicalizing File Paths
      • Using Parameters
    • Internal/External Graph Elements
    • Dictionary
      • Creating a Dictionary
      • Using a Dictionary in Graphs
    • Execution Properties
    • Notes in Graphs
      • Placing Notes into Graph
      • Resizing Notes
      • Editing Notes
      • Formatted Text
      • Links from Notes
      • Folding Notes
      • Notes Properties
    • Transformations
      • Defining Transformations
      • Transform Editor
      • Common Java Interfaces
    • Data Partitioning (Parallel Running)
    • Data Partitioning in Cluster
      • High Availability
      • Scalability
      • Graph Allocation Examples
      • Example of Distributed Execution
      • Remote Edges
    • Readers
      • Common Properties of Readers
      • ComplexDataReader
      • DatabaseReader
      • DataGenerator
      • DataOneVFSReader
      • EDIFACTReader
      • FlatFileReader
      • JSONExtract
      • JSONReader
      • LDAPReader
      • MultiLevelReader
      • SpreadsheetDataReader
      • UniversalDataReader
      • X12Reader
      • XMLExtract
      • XMLReader
      • XMLXPathReader
    • Writers
      • Common Properties of Writers
      • DatabaseWriter
      • DataOneVFSWriter
      • EDIFACTWriter
      • FlatFileWriter
      • JSONWriter
      • LDAPWriter
      • SpreadsheetDataWriter
      • HIDDEN StructuredDataWriter
      • HIDDEN TableauWriter
      • Trash
      • UniversalDataWriter
      • X12Writer
      • XMLWriter
    • Transformers
      • Common Properties of Transformers
      • Aggregate
      • Concatenate
      • DataIntersection
      • DataSampler
      • Dedup
      • Denormalizer
      • ExtSort
      • FastSort
      • Filter
      • Map
      • Merge
      • MetaPivot
      • Normalizer
      • Partition
      • Pivot
      • Rollup
      • SimpleCopy
      • SimpleGather
      • SortWithinGroups
      • XSLTransformer
    • Joiners
      • Common Properties of Joiners
      • Combine
      • CrossJoin
      • DBJoin
      • ExtHashJoin
      • ExtMergeJoin
      • LookupJoin
      • RelationalJoin
    • Others
      • Common Properties of Others
      • CheckForeignKey
      • DBExecute
      • HTTPConnector
      • LookupTableReaderWriter
      • WebServiceClient
    • CTL2 - Data Shaper Transformation Language
    • Language Reference
      • Program Structure
      • Comments
      • Import
      • Data Types in CTL2
      • Literals
      • Variables
      • Dictionary in CTL2
      • Operators
      • Simple Statement and Block of Statements
      • Control Statements
      • Error Handling
      • Functions
      • Conditional Fail Expression
      • Accessing Data Records and Fields
      • Mapping
      • Parameters
      • Regular Expressions
    • CTL Debugging
      • Debug Perspective
      • Importing and Exporting Breakpoints
      • Inspecting Variables and Expressions
      • Examples
    • Functions Reference
      • Conversion Functions
      • Date Functions
      • Mathematical Functions
      • String Functions
      • Mapping Functions
      • Container Functions
      • Record Functions (Dynamic Field Access)
      • Miscellaneous Functions
      • Lookup Table Functions
      • Sequence Functions
      • Data Service HTTP Library Functions
      • Custom CTL Functions
      • CTL2 Appendix - List of National-specific Characters
      • HIDDEN Subgraph Functions
    • Tutorial
      • Creating a Transformation Graph
      • Filtering the records
      • Sorting the Records
      • Processing Speed-up with Parallelization
      • Debugging the Java Transformation
  • DATA SHAPER SERVER
    • Introduction
    • Administration
      • Monitoring
    • Using Graphs
      • Job Queue
      • Execution History
      • Job Inspector
    • Cluster
      • Sandboxes in Cluster
      • Troubleshooting
  • Install Data Shaper
    • Install Data Shaper
      • Introduction to Data Shaper installation process
      • Planning Data Shaper installation
      • Data Shaper System Requirements
      • Data Shaper Domain Master Configuration reference
      • Performing Data Shaper initial installation and master configuration
        • Creating database objects for PostgreSQL
        • Creating database objects for Oracle
        • Executing Data Shaper installer
        • Configuring additional firewall rules for Data Shaper
Powered by GitBook
On this page
  • Terminology
  • Creating a Project
  • Creating a New Data File
  • Creating a Graph
  • Placing Components in the Graph Editor Pane
  • Connecting Components by an Edge
  • Extracting Metadata from the Input File
  • Assigning Metadata to the Edges
  • Setting Up Readers (FlatFileReader)
  • Setting Up Writers (SpreadsheetDataWriter)
  • Running the Graph
  • Opening the Output File
  1. DATA SHAPER DESIGNER
  2. Tutorial

Creating a Transformation Graph

This chapter explains the basics of Data Shaper projects and shows you way to create a simple graph that reads records from a CSV file and writes them to a .xlsx file.

Terminology

Before creating a transformation graph we will explain some terms we use in this tutorial.

  • A workspace is a directory on your computer where your save your projects. It also contains per-workspace configuration. You have chosen it during the start of Designer.

  • A project is a directory in workspace. It is the location where you place data transformations and data.

  • A graph , or a transformation graph, is a recipe to data transformation. The graph consists of components which are connected by edges.

Creating a Project

We assume that you have downloaded and installed Data Shaper Designer.

It is the right time to create a new project now.

  1. Select File > New > Data Shaper Project from the main menu.

  2. Type the name of the project, e.g. Project_01.

Creating a New Data File

Now you need a data file. You probably have some. If not, you can create an example file as shown below.

The best practice is to place your input data into data-in.

Right-click data-in item in the Project Explorer pane and select New > File from the context menu. Type file name, e.g. input.dat. It will be created and stored in the highlighted data-in subfolder.

[block:image] { "images": [ { "image": [ "https://files.readme.io/362a204-DS-CreateNewFile.png", null, "" ], "align": "center", "sizing": "400px", "border": true } ] } [/block]

The file will be created and opened. Enter some data records in this file; for example, copy and paste the lines below (make sure there is an empty line at the end): John;Smith;25000 Peter;Brown;30000 George;Hardy;20000 Richard;Gordon;22000 Mark;Taylor;40000 Michael;Lester;18000 George;Smith;30000 Albert;Brown;30000

[block:image] { "images": [ { "image": [ "https://files.readme.io/4873af9-DS-Inputdat.png", null, "" ], "align": "center", "border": true } ] } [/block]

Now you will create your graph.

Creating a Graph

After creating a new project, create a new graph: select File > New > Graph from the main menu. The graph is a recipe of your data transformation. Give a name to the graph and choose a directory for it. We choose graph as the graph name. Data Shaper Designer gives it the .grf extension automatically.

Data Shaper Designer offers the graph subfolder. It is the recommended place for graphs.

Placing Components in the Graph Editor Pane

To create a graph, select the components from the Palette of Components and place them in the Graph Editor pane. The Palette of Components is located on the right side of the Graph Editor pane.

If the Palette is not displayed, click an arrow at the right top of the Graph Editor pane. In this way, the Palette will remain opened until you fold it.

In the Palette, find the FlatFileReader label among Readers. Drag FlatFileReader from the Palette into the Graph Editor pane.

Do the same with the SpreadsheetDataWriter component from Writers. Put these components in the Graph Editor from left to right.

If you know the component name, you can add component using the Add Component dialog window. Press Shift+Space within graph editor and start typing the name.

Connecting Components by an Edge

Click the first output port of FlatFileReader.

An edge appears connected to the output port of the component. Now click inside the Filter component near its input port.

The edges are still red and dashed since no metadata are assigned to them.

If you missed the port, a dialog for adding a new component would appear.

In the next step you will assign metadata to the edge.

Extracting Metadata from the Input File

Metadata is data describing the data structure.

You can extract metadata from your flat data file or create it by your own. To extract it from input file, right-click the first edge and select New metadata > Extract from flat file.

A wizard for metadata extraction opens. Use the Browse button to open the dialog to specify a file. Select the input.dat file in data-in directory and click the OK button. The Metadata Editor fills up:

Click Next to specify metadata fields.

As you can see, the wizard guessed that the records consisted of three fields and it also understood that the third field values were integer numbers.

You can replace the three default field names (Field1, Field2 and Field3) with more descriptive ones: FirstName, LastName and Salary.

To do that, click the Field1 item and enter the new field name.

Do the same with the other two field names. The result will look like this:

Now click Finish. This way you have created metadata. The metadata has been assigned to the edge.

You can extract metadata on edges and on input components.

Assigning Metadata to the Edges

If you have metadata assigned to the edge from previous step, you do not have to assign it once more.

If you have any edge without metadata and you would like to assign the metadata to the edge, right-click the edge and select the Select Metadata item from the context menu.

Select the desired metadata by clicking its item. The edge with assigned metadata becomes solid.

Setting Up Readers (FlatFileReader)

To set up the FlatFileReader, double-click this component in the Graph Editor pane. The component editor opens. Click the File URL attribute row in component editor. A red button appears in the row. Click the tiny button to open the File URL dialog. The input file is in the data-in directory.

Setting Up Writers (SpreadsheetDataWriter)

When you set up writers, the most important thing is to specify the output files to which data should be written.

Double-click the SpreadsheetDataWriter component. Click the File URL attribute row in the component editor. After that, a button appears in this, click the button.

In the File URL dialog, select the output directory and enter the file name.

Click OK to use the new component configuration.

You have created a (transformation) graph, use Ctrl-S to save it. The graph is ready to be run.

Running the Graph

To run the graph, right-click anywhere inside the Graph Editor pane and select Run Data Shaper Graph from the context menu. The graph will run.

In the Console tab below the Graph Editor pane, you can see the graph run report. If everything is OK, the graph execution will be successful.

You should see the following window with numbers of parsed records near below the edges:

If you would like to see more detailed information about graph run, double-click the Console tab. The tab will cover the whole window. You can restore the original size of this tab when you double-click it again.

Opening the Output File

After running a graph, the file structure of the Project Explorer pane refreshes automatically. Expand the data-out item to see the output.xlsx file.

Double click the file to open it with an appropriate spreadsheet editor.

Summary

We have learned to:

  • create a transformation graph

  • place component to a graph

  • assign metadata to an edge

  • run a graph

  • read data from a CSV file

  • write data to Excel spreadsheet

What to do next

You can also play with built-in pre-prepared examples: Help > Data Shaper Examples.

PreviousTutorialNextFiltering the records

Last updated 1 month ago

You can continue with or .

Filtering the records
Sorting the Records