Primeur Online Docs
Data Shaper
Data Shaper
  • 🚀GETTING STARTED
    • What is Primeur Data Shaper
      • What is the Data Shaper Designer
      • What is the Data Shaper Server
      • What is the Data Shaper Cluster
    • How does the Data Shaper Designer work
      • Designer Views and Graphs
      • Data Shaper Graphs
      • Designer Projects and Sandboxes
      • Data Shaper Designer Reference
    • How do the Data Shaper Server and Cluster work
      • Data Shaper Server and Cluster
      • Data Shaper Server Reference
    • VFS Graph Components
      • DataOneFileDescriptor (DOFD) metadata
      • Passing parameters from Data One Contract to Data Shaper graph
      • Inheriting Data One context attributes in Data Shaper graph
  • DATA SHAPER DESIGNER
    • Configuration
      • Runtime configuration
        • Logging
        • Master Password
        • User Classpath
      • Data Shaper Server Integration
      • Execution monitoring
      • Java configuration
      • Engine configuration
      • Refresh Operation
    • Designer User Interface
      • Graph Editor with Palette of Components
      • Project Explorer Pane
      • Outline Pane
      • Tabs Pane
      • Execution Tab
      • Keyboard Shortcuts
    • Projects
      • Creating Data Shaper projects
      • Converting Data Shaper projects
      • Structure of Data Shaper projects
      • Versioning of server project content
      • Working with Data Shaper Server Projects
      • Project configuration
    • Graphs
      • Creating an empty graph
      • Creating a simple graph
        • Placing Components
        • Placing Components from Palette
        • Connecting Components with Edges
    • Execution
      • Successful Graph Execution
      • Run configuration
      • Connecting to a running job
      • Graph states
    • Common dialogs
      • URL file dialog
      • Edit Value dialog
      • Open Type dialog
    • Import
      • Import Data Shaper projects
      • Import from Data Shaper server sandbox
      • Import graphs
      • Import metadata
    • Export
      • Export graphs to HTML
      • Export to Data Shaper Server sandbox
      • Export image
      • Export Project as Library
    • Graph tracking
      • Changing record count font size
    • Search functionality
    • Working with Data Shaper server
      • Data Shaper server project basic principles
      • Connecting via HTTP
      • Connecting via HTTPS
      • Connecting via Proxy Server
    • Graph components
      • Adding components
      • Finding components
      • Edit component dialog box
      • Enable/disable component
      • Passing data through disabled component
      • Common properties of components
      • Specific attribute types
      • Metadata templates
    • Edges
      • Connecting Components with Edges
      • Types of Edges
      • Assigning Metadata to Edges
      • Colors of Edges
      • Debugging Edges
      • Edge Memory Allocation
    • Metadata
      • Records and Fields
        • Record Types
        • Data Types in Metadata
        • Data Formats
        • Locale and Locale Sensitivity
        • Time Zone
        • Autofilling Functions
      • Metadata Types
        • Internal Metadata
        • External (Shared) Metadata
        • SQL Query Metadata
        • Reading Metadata from Special Sources
      • Auto-propagated Metadata
        • Sources of Auto-Propagated Metadata
        • Explicitly Propagated Metadata
        • Priorities of Metadata
        • Propagation of SQL Query Metadata
      • Creating Metadata
        • Extracting Metadata from a Flat File
        • Extracting Metadata from an XLS(X) File
        • Extracting Metadata from a Database
        • Extracting Metadata from a DBase File
        • Extracting Metadata from Salesforce
        • SQL Query Metadata
        • User Defined Metadata
      • Merging Existing Metadata
      • Creating Database Table from Metadata and Database Connection
      • Metadata Editor
        • Opening Metadata Editor
        • Basics of Metadata Editor
        • Record Pane
        • Field Name vs. Label vs. Description
        • Details Pane
      • Changing and Defining Delimiters
      • Editing Metadata in the Source Code
      • Multi-value Fields
        • Lists and Maps Support in Components
        • Joining on multivalue fields (Comparison Rules)
    • Connections
      • Database Connections
        • Internal Database Connections
        • External (Shared) Database Connections
        • Database Connections Properties
        • Encryption of Access Password
        • Browsing Database and Extracting Metadata from Database Tables
        • Windows Authentication on Microsoft SQL Server
        • Snowflake Connection
        • Hive Connection
        • Troubleshooting
      • JMS Connections
      • QuickBase Connections
      • Hadoop Connections
      • Kafka Connections
      • OAuth2 Connections
      • MongoDB Connections
      • Salesforce Connections
    • Lookup Tables
      • Lookup Tables in Cluster Environment
      • Internal Lookup Tables
      • External (Shared) Lookup Tables
      • Types of Lookup Tables
    • Sequences
      • Persistent Sequences
      • Non Persistent Sequences
      • Internal Sequences
      • External (Shared) Sequences
      • Editing a Sequence
      • Sequences in Cluster Environment
    • Parameters
      • Internal Parameters
      • External (Shared) Parameters
      • Secure Graph Parameters
      • Graph Parameter Editor
      • Secure Graph Parameters
      • Parameters with CTL2 Expressions (Dynamic Parameters)
      • Environment Variables
      • Canonicalizing File Paths
      • Using Parameters
    • Internal/External Graph Elements
    • Dictionary
      • Creating a Dictionary
      • Using a Dictionary in Graphs
    • Execution Properties
    • Notes in Graphs
      • Placing Notes into Graph
      • Resizing Notes
      • Editing Notes
      • Formatted Text
      • Links from Notes
      • Folding Notes
      • Notes Properties
    • Transformations
      • Defining Transformations
      • Transform Editor
      • Common Java Interfaces
    • Data Partitioning (Parallel Running)
    • Data Partitioning in Cluster
      • High Availability
      • Scalability
      • Graph Allocation Examples
      • Example of Distributed Execution
      • Remote Edges
    • Readers
      • Common Properties of Readers
      • ComplexDataReader
      • DatabaseReader
      • DataGenerator
      • DataOneVFSReader
      • EDIFACTReader
      • FlatFileReader
      • JSONExtract
      • JSONReader
      • LDAPReader
      • MultiLevelReader
      • SpreadsheetDataReader
      • UniversalDataReader
      • X12Reader
      • XMLExtract
      • XMLReader
      • XMLXPathReader
    • Writers
      • Common Properties of Writers
      • DatabaseWriter
      • DataOneVFSWriter
      • EDIFACTWriter
      • FlatFileWriter
      • JSONWriter
      • LDAPWriter
      • SpreadsheetDataWriter
      • HIDDEN StructuredDataWriter
      • HIDDEN TableauWriter
      • Trash
      • UniversalDataWriter
      • X12Writer
      • XMLWriter
    • Transformers
      • Common Properties of Transformers
      • Aggregate
      • Concatenate
      • DataIntersection
      • DataSampler
      • Dedup
      • Denormalizer
      • ExtSort
      • FastSort
      • Filter
      • Map
      • Merge
      • MetaPivot
      • Normalizer
      • Partition
      • Pivot
      • Rollup
      • SimpleCopy
      • SimpleGather
      • SortWithinGroups
      • XSLTransformer
    • Joiners
      • Common Properties of Joiners
      • Combine
      • CrossJoin
      • DBJoin
      • ExtHashJoin
      • ExtMergeJoin
      • LookupJoin
      • RelationalJoin
    • Others
      • Common Properties of Others
      • CheckForeignKey
      • DBExecute
      • HTTPConnector
      • LookupTableReaderWriter
      • WebServiceClient
    • CTL2 - Data Shaper Transformation Language
    • Language Reference
      • Program Structure
      • Comments
      • Import
      • Data Types in CTL2
      • Literals
      • Variables
      • Dictionary in CTL2
      • Operators
      • Simple Statement and Block of Statements
      • Control Statements
      • Error Handling
      • Functions
      • Conditional Fail Expression
      • Accessing Data Records and Fields
      • Mapping
      • Parameters
      • Regular Expressions
    • CTL Debugging
      • Debug Perspective
      • Importing and Exporting Breakpoints
      • Inspecting Variables and Expressions
      • Examples
    • Functions Reference
      • Conversion Functions
      • Date Functions
      • Mathematical Functions
      • String Functions
      • Mapping Functions
      • Container Functions
      • Record Functions (Dynamic Field Access)
      • Miscellaneous Functions
      • Lookup Table Functions
      • Sequence Functions
      • Data Service HTTP Library Functions
      • Custom CTL Functions
      • CTL2 Appendix - List of National-specific Characters
      • HIDDEN Subgraph Functions
    • Tutorial
      • Creating a Transformation Graph
      • Filtering the records
      • Sorting the Records
      • Processing Speed-up with Parallelization
      • Debugging the Java Transformation
  • DATA SHAPER SERVER
    • Introduction
    • Administration
      • Monitoring
    • Using Graphs
      • Job Queue
      • Execution History
      • Job Inspector
    • Cluster
      • Sandboxes in Cluster
      • Troubleshooting
  • Install Data Shaper
    • Install Data Shaper
      • Introduction to Data Shaper installation process
      • Planning Data Shaper installation
      • Data Shaper System Requirements
      • Data Shaper Domain Master Configuration reference
      • Performing Data Shaper initial installation and master configuration
        • Creating database objects for PostgreSQL
        • Creating database objects for Oracle
        • Executing Data Shaper installer
        • Configuring additional firewall rules for Data Shaper
Powered by GitBook
On this page
  • What is Data Shaper Server?
  • Data Shaper Server Architecture
  • Data Shaper Core
  • Data Shaper Worker
  • Data Shaper Cluster
  1. DATA SHAPER SERVER

Introduction

PreviousDebugging the Java TransformationNextAdministration

Last updated 1 month ago

What is Data Shaper Server?

The Data Shaper Server is an enterprise runtime, monitoring and automation platform. It is a Java application built to J2EE standards with HTTP and SOAP Web Services APIs providing an additional automation control for integration into existing application portfolios and processes.

The Data Shaper Server provides necessary tools to deploy, monitor, schedule, integrate and automate data integration processes in large scale and complex projects. The Data Shaper Server supports a wide range of application servers: Apache Tomcat, VMware tc Server and Red Hat JBoss Web Server.

The Data Shaper Server simplifies the process of:

  • Operation - The Data Shaper Server allows you to the status of the Server and .

To learn more about the architecture of the Data Shaper Server, see the Data Shaper Server Architecture section here below.

Data Shaper Server Architecture

The Data Shaper Server is a Java application distributed as a web application archive (.war) for an easy deployment on various application servers. It is compatible with Windows and Unix-like operating systems.

The Data Shaper Server requires the Java Development Kit (JDK) to run. We do not recommend using Java Runtime Environment (JRE) only, since the compilation of some transformations requires the JDK to function properly.

The Server requires space on the file system to store persistent data (transformation graphs) and temporary data (temporary files, debugging data, etc.). It also requires an external relational database to save run records, permission, users' data, etc.

The Data Shaper Server architecture consists of the Core and the Worker.

Data Shaper Core

Data Shaper Worker

The Worker is a separate process that executes graphs. The purpose of the Worker is to provide a sandboxed execution environment. For more information, see the Data Shaper Worker section below.

Dependencies on External Services

The Server requires a database to store its configuration, user accounts, execution history, etc. It comes bundled with an Apache Derby database to ease the evaluation. To use the Data Shaper Server in a production environment, a relational database is needed.

The Server needs a connection to a SMTP server to be able to send you notification emails.

Users and groups' data can be stored in the database or be read from an LDAP server.

Server Core - Worker Communication

The Server Core receives the Worker’s stdout and stderr. The processes communicate via TCP connections.

Data Shaper Core

The Data Shaper Core is the visible part of the Server with a web-based user interface.

The Data Shaper Core connects to the system database and stores its configuration and service records in it. The system database is required. If it is configured, the Core connects to an SMTP server to send notification emails or to an LDAP server to authenticate users against an existing LDAP database.

Data Shaper Worker

The Worker is a standalone JVM running separately from the Server Core. This provides an isolation of the Server Core from executed jobs (e.g. graphs). Therefore, an issue caused by a job in the Worker will not affect the Server Core.

The Worker does not require any additional installation - it is started and managed by the Server. The Worker runs on the same host as the Server Core, i.e. it is not used for parallel or distributed processes. In the Cluster, each node has its own Worker.

The Worker is a relatively light-weight and simple executor of jobs. It handles job execution requests from the Server Core, but does not perform any high-level job management. It communicates with the Server Core via an API for more complex activities, e.g. to request execution of other jobs, check file permissions, etc.

Configuration

General Configuration

The Worker is started by the Server Core as a standalone JVM process. These default configurations of the Worker can be changed in the Setup:

  • Heap memory limits

  • Port ranges

  • Additional command line arguments The settings are stored in the usual Server configuration file. The Worker is configured via special configuration properties. A full command line of the Worker is available in the Monitoring section.

Cluster specific configuration

The Cluster should use a single portRange: all nodes should have identical values of the portRange. That is the preferred configuration, although different ranges for individual nodes are possible.

Management

The Server manages the runtime of the Worker, i.e. it is able to start, stop, restart the Worker, etc. Users don’t need to manually install and start the Worker.

Job Execution

By default, all jobs are executed in the Worker; yet the Server Core still keeps the capability to execute jobs. It is possible to set specific jobs or whole sandboxes to run in the Server Core via the worker_execution property on the job or sandbox. It is also possible to disable the Worker completely, in which case all jobs will be executed in the Server Core.

Executing jobs in the Server Core should be an exception. To see where the job was executed, look in the run details in Execution History > Executor field. Jobs started in the Worker also log a message in their log, e.g. Job is executed on Worker:[worker0@node01:10500].

Job Configuration

The following areas of the Worker configuration affect job executions:

  • JNDI Graphs running in the Worker cannot use JNDI as defined in the application container of the Server Core, because the Worker is a separate JVM process. The Worker provides its own JNDI configuration.

  • Classpath The classpath is not shared between the Server Core and the Worker.

Data Shaper Cluster

The Data Shaper Cluster allows multiple instances of the Data Shaper Server to run on different hardware nodes and form a computer Cluster. In this distributed environment, data transfer between Data Shaper Server instances is performed by Remote Edges.

The Data Shaper Cluster offers several advantages for Big Data processing:

  • High Availability - All nodes are virtually equal; therefore, almost all request can be processed by any Cluster node. This means that if one node is disabled, another node can substitute it. To achieve high availability, it is recommended to use an independent HTTP load balancer.

The Data Shaper Cluster requires a special license.

The Data Shaper Server's Core checks permissions and the UI. For more information, see the Data Shaper Core section below.

The Data Shaper Core is the central point of the Data Shaper Server. It manages and Workers that run the jobs.

The status of the Worker and its actions are available in the section.

Scalability - It allows for increased performance by adding more nodes. There are two independent levels of scalability implemented: scalability of transformation requests and data scalability. For general information about the Data Shaper Cluster, see and . In Cluster environment, you can use several types of sandboxes, see .

monitors
monitors
Monitoring Worker
Data Partitioning (Parallel Running)
Data Partitioning in Cluster
Sandboxes in Cluster
monitor
jobs