Primeur Online Docs
Data Shaper
Data Shaper
  • 🚀GETTING STARTED
    • What is Primeur Data Shaper
      • What is the Data Shaper Designer
      • What is the Data Shaper Server
      • What is the Data Shaper Cluster
    • How does the Data Shaper Designer work
      • Designer Views and Graphs
      • Data Shaper Graphs
      • Designer Projects and Sandboxes
      • Data Shaper Designer Reference
    • How do the Data Shaper Server and Cluster work
      • Data Shaper Server and Cluster
      • Data Shaper Server Reference
    • VFS Graph Components
      • DataOneFileDescriptor (DOFD) metadata
      • Passing parameters from Data One Contract to Data Shaper graph
      • Inheriting Data One context attributes in Data Shaper graph
  • DATA SHAPER DESIGNER
    • Configuration
      • Runtime configuration
        • Logging
        • Master Password
        • User Classpath
      • Data Shaper Server Integration
      • Execution monitoring
      • Java configuration
      • Engine configuration
      • Refresh Operation
    • Designer User Interface
      • Graph Editor with Palette of Components
      • Project Explorer Pane
      • Outline Pane
      • Tabs Pane
      • Execution Tab
      • Keyboard Shortcuts
    • Projects
      • Creating Data Shaper projects
      • Converting Data Shaper projects
      • Structure of Data Shaper projects
      • Versioning of server project content
      • Working with Data Shaper Server Projects
      • Project configuration
    • Graphs
      • Creating an empty graph
      • Creating a simple graph
        • Placing Components
        • Placing Components from Palette
        • Connecting Components with Edges
    • Execution
      • Successful Graph Execution
      • Run configuration
      • Connecting to a running job
      • Graph states
    • Common dialogs
      • URL file dialog
      • Edit Value dialog
      • Open Type dialog
    • Import
      • Import Data Shaper projects
      • Import from Data Shaper server sandbox
      • Import graphs
      • Import metadata
    • Export
      • Export graphs to HTML
      • Export to Data Shaper Server sandbox
      • Export image
      • Export Project as Library
    • Graph tracking
      • Changing record count font size
    • Search functionality
    • Working with Data Shaper server
      • Data Shaper server project basic principles
      • Connecting via HTTP
      • Connecting via HTTPS
      • Connecting via Proxy Server
    • Graph components
      • Adding components
      • Finding components
      • Edit component dialog box
      • Enable/disable component
      • Passing data through disabled component
      • Common properties of components
      • Specific attribute types
      • Metadata templates
    • Edges
      • Connecting Components with Edges
      • Types of Edges
      • Assigning Metadata to Edges
      • Colors of Edges
      • Debugging Edges
      • Edge Memory Allocation
    • Metadata
      • Records and Fields
        • Record Types
        • Data Types in Metadata
        • Data Formats
        • Locale and Locale Sensitivity
        • Time Zone
        • Autofilling Functions
      • Metadata Types
        • Internal Metadata
        • External (Shared) Metadata
        • SQL Query Metadata
        • Reading Metadata from Special Sources
      • Auto-propagated Metadata
        • Sources of Auto-Propagated Metadata
        • Explicitly Propagated Metadata
        • Priorities of Metadata
        • Propagation of SQL Query Metadata
      • Creating Metadata
        • Extracting Metadata from a Flat File
        • Extracting Metadata from an XLS(X) File
        • Extracting Metadata from a Database
        • Extracting Metadata from a DBase File
        • Extracting Metadata from Salesforce
        • SQL Query Metadata
        • User Defined Metadata
      • Merging Existing Metadata
      • Creating Database Table from Metadata and Database Connection
      • Metadata Editor
        • Opening Metadata Editor
        • Basics of Metadata Editor
        • Record Pane
        • Field Name vs. Label vs. Description
        • Details Pane
      • Changing and Defining Delimiters
      • Editing Metadata in the Source Code
      • Multi-value Fields
        • Lists and Maps Support in Components
        • Joining on multivalue fields (Comparison Rules)
    • Connections
      • Database Connections
        • Internal Database Connections
        • External (Shared) Database Connections
        • Database Connections Properties
        • Encryption of Access Password
        • Browsing Database and Extracting Metadata from Database Tables
        • Windows Authentication on Microsoft SQL Server
        • Snowflake Connection
        • Hive Connection
        • Troubleshooting
      • JMS Connections
      • QuickBase Connections
      • Hadoop Connections
      • Kafka Connections
      • OAuth2 Connections
      • MongoDB Connections
      • Salesforce Connections
    • Lookup Tables
      • Lookup Tables in Cluster Environment
      • Internal Lookup Tables
      • External (Shared) Lookup Tables
      • Types of Lookup Tables
    • Sequences
      • Persistent Sequences
      • Non Persistent Sequences
      • Internal Sequences
      • External (Shared) Sequences
      • Editing a Sequence
      • Sequences in Cluster Environment
    • Parameters
      • Internal Parameters
      • External (Shared) Parameters
      • Secure Graph Parameters
      • Graph Parameter Editor
      • Secure Graph Parameters
      • Parameters with CTL2 Expressions (Dynamic Parameters)
      • Environment Variables
      • Canonicalizing File Paths
      • Using Parameters
    • Internal/External Graph Elements
    • Dictionary
      • Creating a Dictionary
      • Using a Dictionary in Graphs
    • Execution Properties
    • Notes in Graphs
      • Placing Notes into Graph
      • Resizing Notes
      • Editing Notes
      • Formatted Text
      • Links from Notes
      • Folding Notes
      • Notes Properties
    • Transformations
      • Defining Transformations
      • Transform Editor
      • Common Java Interfaces
    • Data Partitioning (Parallel Running)
    • Data Partitioning in Cluster
      • High Availability
      • Scalability
      • Graph Allocation Examples
      • Example of Distributed Execution
      • Remote Edges
    • Readers
      • Common Properties of Readers
      • ComplexDataReader
      • DatabaseReader
      • DataGenerator
      • DataOneVFSReader
      • EDIFACTReader
      • FlatFileReader
      • JSONExtract
      • JSONReader
      • LDAPReader
      • MultiLevelReader
      • SpreadsheetDataReader
      • UniversalDataReader
      • X12Reader
      • XMLExtract
      • XMLReader
      • XMLXPathReader
    • Writers
      • Common Properties of Writers
      • DatabaseWriter
      • DataOneVFSWriter
      • EDIFACTWriter
      • FlatFileWriter
      • JSONWriter
      • LDAPWriter
      • SpreadsheetDataWriter
      • HIDDEN StructuredDataWriter
      • HIDDEN TableauWriter
      • Trash
      • UniversalDataWriter
      • X12Writer
      • XMLWriter
    • Transformers
      • Common Properties of Transformers
      • Aggregate
      • Concatenate
      • DataIntersection
      • DataSampler
      • Dedup
      • Denormalizer
      • ExtSort
      • FastSort
      • Filter
      • Map
      • Merge
      • MetaPivot
      • Normalizer
      • Partition
      • Pivot
      • Rollup
      • SimpleCopy
      • SimpleGather
      • SortWithinGroups
      • XSLTransformer
    • Joiners
      • Common Properties of Joiners
      • Combine
      • CrossJoin
      • DBJoin
      • ExtHashJoin
      • ExtMergeJoin
      • LookupJoin
      • RelationalJoin
    • Others
      • Common Properties of Others
      • CheckForeignKey
      • DBExecute
      • HTTPConnector
      • LookupTableReaderWriter
      • WebServiceClient
    • CTL2 - Data Shaper Transformation Language
    • Language Reference
      • Program Structure
      • Comments
      • Import
      • Data Types in CTL2
      • Literals
      • Variables
      • Dictionary in CTL2
      • Operators
      • Simple Statement and Block of Statements
      • Control Statements
      • Error Handling
      • Functions
      • Conditional Fail Expression
      • Accessing Data Records and Fields
      • Mapping
      • Parameters
      • Regular Expressions
    • CTL Debugging
      • Debug Perspective
      • Importing and Exporting Breakpoints
      • Inspecting Variables and Expressions
      • Examples
    • Functions Reference
      • Conversion Functions
      • Date Functions
      • Mathematical Functions
      • String Functions
      • Mapping Functions
      • Container Functions
      • Record Functions (Dynamic Field Access)
      • Miscellaneous Functions
      • Lookup Table Functions
      • Sequence Functions
      • Data Service HTTP Library Functions
      • Custom CTL Functions
      • CTL2 Appendix - List of National-specific Characters
      • HIDDEN Subgraph Functions
    • Tutorial
      • Creating a Transformation Graph
      • Filtering the records
      • Sorting the Records
      • Processing Speed-up with Parallelization
      • Debugging the Java Transformation
  • DATA SHAPER SERVER
    • Introduction
    • Administration
      • Monitoring
    • Using Graphs
      • Job Queue
      • Execution History
      • Job Inspector
    • Cluster
      • Sandboxes in Cluster
      • Troubleshooting
  • Install Data Shaper
    • Install Data Shaper
      • Introduction to Data Shaper installation process
      • Planning Data Shaper installation
      • Data Shaper System Requirements
      • Data Shaper Domain Master Configuration reference
      • Performing Data Shaper initial installation and master configuration
        • Creating database objects for PostgreSQL
        • Creating database objects for Oracle
        • Executing Data Shaper installer
        • Configuring additional firewall rules for Data Shaper
Powered by GitBook
On this page
  • Supported File URL Formats for Readers
  • Viewing Data on Readers
  • Input Port Reading
  • Incremental Reading
  • Selecting Input Records
  • Data Policy
  • XML Features
  • CTL Templates for Readers
  • Java Interfaces for Readers
  1. DATA SHAPER DESIGNER
  2. Readers

Common Properties of Readers

PreviousReadersNextComplexDataReader

Last updated 1 month ago

  • Readers allow you to specify the location of input data. See examples below of the File URL attribute for reading from local and remote files, through proxy, input port and dictionary in below.

    • Readers allow you to view the source data. See below.

    • Readers can read data from the input port. E.g. you can read URLs of files to be read. See below.

    • Readers can read only the new records. See below.

    • Readers can skip specific number of initial records or set limit on number of records to be read. See below.

    • Readers allow you to configure a policy related to parsing incomplete or invalid data record. See below.

    • Some readers can log information about errors.

    • XML-reading components allow you to configure the parser. See below.

    • In some Readers, a transformation can be or must be defined. For information about transformation templates for transformations written in CTL see below.

    • Similarly, for information about transformation interfaces that must be implemented in transformations written in Java see below.

COMPONENT
DATA SOURCE
INPUT PORTS
OUTPUT PORTS
EACH TO ALL OUTPUTS
DIFFERENT TO DIFFERENT OUTPUTS {
TRANSFORMATION
TRANSF. REQ.
JAVA
CTL
AUTO-PROPAGATED METADATA

flat file

1

1-n

x

✓

✓

✓

✓

✓

x

database

0

1-n

✓

x

x

x

x

x

x

none

0

1-n

x

✓

✓

✓

✓

✓

x

EDIFACT files

0-1

1-n

x

✓

x

x

x

x

x

flat file

0-1

1-2

x

✓

x

x

x

x

x

JSON file

0-1

1-n

x

✓

x

x

x

x

x

JSON file

0-1

1-n

x

✓

x

x

x

x

x

LDAP directory tree

0

1-n

x

x

x

x

x

x

x

flat file

1

1-n

x

✓

✓

✓

✓

x

x

XLS(X) file

0-1

1-2

x

x

x

x

x

x

x

flat file

0-1

1-n

x

✓

x

x

x

x

x

X12 files

0-1

1-n

x

✓

x

x

x

x

x

XML file

0-1

1-n

x

✓

x

x

x

x

x

XML file

0-1

1-n

x

✓

x

x

x

x

x

XML file

0-1

1-n

x

✓

x

x

x

x

x

Supported File URL Formats for Readers

Warning!

To ensure graph portability, forward slashes must be used when defining the path in URLs (even on Microsoft Windows).

Below are examples of possible URL for Readers:

Reading of Local Files

  • /path/filename.txt Reads a specified file.

  • /path1/filename1.txt;/path2/filename2.txt Reads two specified files.

  • /path/filename?.txt Reads all files satisfying the mask.

  • /path/* Reads all files in a specified directory.

  • zip:(/path/file.zip) Reads the first file compressed in the file.zip file.

  • zip:(/path/file.zip)#innerfolder/filename.txt Reads a specified file compressed in the file.zip file.

  • gzip:(/path/file.gz) Reads the first file compressed in the file.gz file.

  • tar:(/path/file.tar)#innerfolder/filename.txt Reads a specified file archived in the file.tar file.

  • zip:(/path/file??.zip)#innerfolder?/filename.* Reads all files from the compressed zip file(s) that satisfy the specified mask. Wild cards (? and *) may be used in the compressed file names, inner folder and inner file names.

  • tar:(/path/file????.tar)#innerfolder??/filename*.txt Reads all files from the archive file(s) that satisfy the specified mask. Wild cards (? and *) may be used in the compressed file names, inner folder and inner file names.

  • gzip:(/path/file*.gz) Reads all files that has been gzipped into the file that satisfy the specified mask. Wild cards may be used in the compressed file names.

  • tar:(gzip:/path/file.tar.gz)#innerfolder/filename.txt Reads a specified file compressed in the file.tar.gz file.

Note: Although Data Shaper can read data from a .tar file, writing to .tar files is not supported.

  • tar:(gzip:/path/file??.tar.gz)#innerfolder?/filename*.txt Reads all files from the gzipped tar archive file(s) that satisfy the specified mask. Wild cards (? and *) may be used in the compressed file names, inner folder and inner file names.

  • zip:(zip:(/path/name?.zip)#innerfolder/file.zip)#innermostfolder?/filename*.txt Reads all files satisfying the file mask from all paths satisfying the path mask from all compressed files satisfying the specified zip mask. Wild cards (? and *) may be used in the outer compressed files, innermost folder and innermost file names. They cannot be used in the inner folder and inner zip file names.

Reading of Remote Files

  • ftp://username:password@server/path/filename.txt Reads a specified filename.txt file on a remote server connected via an FTP protocol using username and password.

  • http://server/path/filename.txt Reads a specified filename.txt file on a remote server connected via an HTTP protocol.

  • https://server/path/filename.txt Reads a specified filename.txt file on a remote server connected via an HTTPS protocol.

  • zip:(ftp://username:password@server/path/file.zip)#innerfolder/filename.txt Reads a specified filename.txt file compressed in the file.zip file on a remote server connected via an FTP protocol using username and password.

  • zip:(http://server/path/file.zip)#innerfolder/filename.txt Reads a specified filename.txt file compressed in the file.zip file on a remote server connected via an HTTP protocol.

  • tar:(ftp://username:password@server/path/file.tar)#innerfolder/filename.txt Reads a specified filename.txt file archived in the file.tar file on a remote server connected via an FTP protocol using username and password.

  • zip:(zip:(ftp://username:password@server/path/name.zip)#innerfolder/file.zip)#innermostfolder /filename.txt Reads a specified filename.txt file compressed in the file.zip file that is also compressed in the name.zip file on a remote server connected via an FTP protocol using username and password.

  • gzip:(http://server/path/file.gz) Reads the first file compressed in the file.gz file on a remote server connected via an HTTP protocol.

  • http://server/filename*.dat Reads all files from a WebDAV server which satisfy specified mask (only * is supported).

  • smb2://domain%3Buser:password@server/path/filename.txt Reads files from Windows share (Microsoft SMB/CIFS protocol) version 2 and 3. The SMB2 protocol is implemented in the SMBJ library.

Reading from Input Port

  • port:$0.FieldName:discrete Each data record field from input port represents one particular data source.

  • port:$0.FieldName:source Each data record field from an input port represents a URL to be loaded in and parsed.

  • port:$0.FieldName:stream Input port field values are concatenated and processed as an input file(s); null values are replaced by the eof.

Using Proxy in Readers

  • http:(direct:)//seznam.cz Without proxy.

  • ftp:(proxy://user:password@proxyserver:1234)//seznam.cz Proxy setting for FTP protocol.

  • sftp:(proxy://66.11.122.193:443)//user:password@server/path/file.dat Proxy setting for SFTP protocol.

Reading from Dictionary

  • dict:keyName:discrete[1] Reads data from dictionary.

  • dict:keyName:source[1] Reads data from dictionary in the same way as the discrete processing type, but expects that the dictionary values are input file URLs. The data from this input passes to the Reader.

[1] Reader finds out the type of source value from a dictionary and creates a readable channel for a parser. Reader supports the following type of sources: InputStream, byte[], ReadableByteChannel, CharSequence, CharSequence[], List, List<byte[]>, ByteArrayOutputStream

Sandbox Resource as Data Source

A sandbox resource, whether it is a shared, local or partitioned sandbox, is specified in the graph under the fileURL attributes as a so called sandbox URL like this: sandbox://data/path/to/file/file.dat

where data is a code for sandbox and path/to/file/file.dat is the path to the resource from the sandbox root. The URL is evaluated by Data Shaper Server during graph execution and a component (reader or writer) obtains the opened stream from the Server. This may be a stream to a local file or to some other remote resource. Thus, a graph does not have to run on the node which has local access to the resource. There may be more sandbox resources used in the graph and each of them may be on a different node. In such cases, Data Shaper Server would choose the node with the most local resources to minimize remote streams.

The sandbox URL has a specific use for parallel data processing. When the sandbox URL with the resource in a partitioned sandbox is used, that part of the graph/phase runs in parallel, according to the node allocation specified by the list of partitioned sandbox locations. Thus, each worker has its own local sandbox resource. Data Shaper Server evaluates the sandbox URL on each worker and provides an open stream to a local resource to the component.

Viewing Data on Readers

Input Port Reading

Processing Type

  • discrete Each data record field from an input port represents one particular data source.

  • source Each data record field from an input port represents a URL to be loaded in and parsed.

Input Port Metadata

In input port reading, only metadata field of some particular data types can be used. The type of the FieldName input field can only be string, byte or cbyte.

Processing of Input Port Record

When graph runs, data is read from original data source (according to metadata of an edge connected to an optional input port of Readers) and received by a Reader through its optional input port. Each record is read independently of the other records. The specified field of each one is processed by the Reader according to the output metadata.

Readers with Input Port Reading

Remember that port reading can also be used by DBExecute for receiving SQL commands. Query URL will be as follows: port:$0.fieldName:discrete. Also an SQL command can be read from a file. Its name, including path, is then passed to DBExecute from an input port and the Query URL attribute should be the following: port:$0.fieldName:source.

Incremental Reading

Incremental reading is a way to read only the new records since the last graph run. This way, you can avoid reading already processed records. Incremental reading allows you to read new records from a single file as well as new records from multiple files. If a file URL possibly matches new files, it can read records from new files.

Incremental reading can be set with the Incremental file and Incremental key attributes. The Incremental key is a string that holds the information about read records/files. This key is stored in the Incremental file attribute. This way, the component reads only the records or files that have not been marked in Incremental file.

Readers with incremental reading are:

Incremental Reading in Database Components

Unlike other incremental readers, in a database component, more database columns can be evaluated and used as key fields. Incremental key is a sequence of the following individual expression separated by a semicolon: keyname=FUNCTIONNAME(db_field)[!InitialValue] (e.g. key01=MAX(EmployeeID);key02=FIRST(CustomerID)!20). The functions that can be selected are: FIRST, LAST, MIN, MAX.

At the same time, when you define an Incremental key, you also need to add these key parts to the Query. In the query, a part of the where sentence will appear; e.g. where db_field1 > #key01 and db_field2 < #key02. This way, you can limit which records will be read next time. It depends on the values of db_field1 and db_field2.

Only the records that satisfy the condition specified by the query will be read. These key fields values are stored in the Incremental file. To define Incremental key, click this attribute row and, by clicking the Plus or Minus buttons in the Define incremental key dialog, add or remove key names and select db field names and function names. Each one of the last two is to be selected from a combo list of possible values.

Selecting Input Records

Some readers allow you to limit the number of records that should be read. You can set up:

  • Maximum number of records to be read

  • Number of records to be skipped

  • Maximum number of records to be read per source

  • Number of records to be skipped per source

The last two constraints can be defined only in readers that allow reading more files at the same time. In these Readers, you can define the records that should be read for each input file separately and for all of the input files in total.

Per Reader Configuration

The maximum number of records to be read is defined by the Max number of record attribute. The number of records to be skipped is defined by the Number of skipped records attribute. The records are skipped and/or read continuously throughout all input files. The records are skipped and/or read independently of the values of per source file attributes.

Per Source File Configuration

In some components, you can also specify how many records should be skipped and/or read from each input file. To do this, set up the following two attributes: Number of skipped records per source and/or Max number of records per source.

Combination of per File and per Reader Configuration

If you set up both: per file and per reader limits; firstly, the per file limits are applied. Secondly, the per reader limits are applied.

For example, there are two files: 1 | Alice 2 | Bob 3 | Carolina 4 | Daniel 5 | Eve

6 | Filip 7 | Gina 8 | Henry 9 | Isac 10 | Jane

And the reader has the following configuration:

ATTRIBUTE
VALUE

Number of skipped records

2

Max number of record

5

Number of skipped records per source

1

Max number of records per source

3

The file are read in the following way:

  1. From each file, the first record is skipped and the next three records are read.

  2. The first two records of the records read in the previous step are skipped. The following records (at most four) are sent to the output.

The records read by the reader are: 4 | Daniel 7 | Gina 8 | Henry 9 | Isac

The example shows that you can read even less records than is the number specified in the Max number of records attribute. The total number of records that are skipped equals to Number of skipped records per source multiplied by the number of source files plus Number of skipped records. And total number of records that are read equals to Max number of records per source multiplied by the number of source files plus Max nuIncremental Reading in Database Componentsmber of records. The Readers that allow limiting the records for both individual input file and all input files in total are:

The following two Readers allow you to limit the total number of records by using the Number of skipped mappings and/or Max number of mappings attributes. What is called mapping here, is a subtree which should be mapped and sent out through the output ports.

The following Readers allow limiting the numbers in a different way:

The following Readers do not allow limiting the number of records that should be read (they read them all):

Data Policy

Data Policy affects processing (parsing) of incorrect or incomplete records. This can be specified by the Data Policy attribute. There are three options:

  • Strict. This data policy is set by default. It means that data parsing stops if a record field with an incorrect value or format is read. Next processing is aborted.

  • Lenient. This data policy means that incorrect records are only skipped and data parsing continues.

Data policy can be set in the following Readers:

XML Features

The options for validation are the following:

  • Custom parser setting

  • Default parser setting

  • No validations

  • All validations

You can define this attribute using the following dialog:

In this dialog, you can add features using the Plus button, select their true or false values, etc.

CTL Templates for Readers

Java Interfaces for Readers

[1] The component sends each data record to all of the connected output ports. [2] The component sends different data records to different output ports using return values of the transformation (DataGenerator and MultiLevelReader). For more information, see . XMLExtract, XMLReader and XMLXPathReader send data to ports as defined in their Mapping or Mapping URL attribute.

The File URL attribute may be defined using the .

sftp://username:password@server/path/filename.txt Reads a specified filename.txt file on a remote server connected via an SFTP protocol using a username and password. If a certificate-based authentication is used, certificates are placed in the ${PROJECT}/ssh-keys/ directory. For more information, see SFTP Certificate in Data Shaper section in . Note, that only certificates without a password are currently supported. The certificate-based authentication has a URL without a password: sftp://username@server/path/filename.txt

s3://access_key_id:secret_access_key@s3.amazonaws.com/bucketname/filename*.out Reads all objects which satisfy the specified mask from an Amazon S3 web storage service from a given bucket using access key ID and a secret access key. It is recommended to connect to S3 via region-specific S3 URL: s3://s3.eu-central-1.amazonaws.com/bucket.name/. The region-specific URL has much better performance than the generic one (s3://s3.amazonaws.com/bucket.name/). See recommendation on Amazon S3 URL in .

az-blob://account:account_key@account.blob.core.windows.net/containername/path/filename*.txt Reads all objects matching the specified mask from the specified container in Microsoft Azure Blob Storage service. Connects using the specified Account Key. See Azure Blob Storage in for other authentication options.

hdfs://CONN_ID/path/filename.dat Reads a file from the Hadoop distributed file system (HDFS). To which HDFS NameNode to connect to is defined in a with CONN_ID. This example file URL reads a file with the /path/filename.dat absolute HDFS path.

smb://domain%3Buser:password@server/path/filename.txt Reads files from Windows share (Microsoft SMB/CIFS protocol) version 1. The URL path may contain wildcards (both * and ? are supported). The server part may be a DNS name, an IP address or a NetBIOS name. The Userinfo part of the URL (domain%3Buser:password) is not mandatory and any URL reserved character it contains should be escaped using the %-encoding similarly to the semicolon ; character with %3B in the example (the semicolon is escaped because it collides with the default Data Shaper file URL separator). The SMB protocol is implemented in the JCIFS library which may be configured using Java system properties. See in the JCIFS documentation for the list of all configurable properties.

See also below.

http:(proxy://user::443)//seznam.cz Proxy setting for HTTP protocol.

s3:(proxy://user::443)//access_key_id:/ bucketname/filename*.dat Proxy setting for S3 protocol.

To view data on the reader click the reader. The records will be displayed in Data Inspector tab. If Data Inspector is not opened, right-click the desired component and select Inspect Data from the context menu. See . The same can be done in some Writers. However, only after the output file has been created. See .

Input port reading allows you to read file names or data from an optional input port. This feature is available in most of Readers. To use input port mapping, connect an edge to an input port. Assign metadata to the edge. In the Reader, edit the File URL attribute. The attribute value has the syntax port:$0.FieldName[:processingType]. You can enter the value directly or with help of . Here processingType is optional and defines if the data is processed as plain data or URL addresses. It can be source, discrete, or stream. If not set explicitly, discrete is applied by default.

stream All data fields from an input port are concatenated and processed as one input file. If the null value of this field is met, it is replaced by the EOF. Following data record fields are parsed as another input file in the same way, i.e. until the null value is met. The Reader starts parsing data as soon as first bytes come by the port and process it progressively until EOF comes. For more information about writing with stream processing type, see .

The component which reads data from databases performs this incremental reading in a different way.

; in addition, this component allows to use the skipRows and/or numRecords attributes of individual XML elements.

; in addition, this component allows to use XPath language to limit the number of mapped XML structures.

The component can use the SQL query or Query URL attribute to limit the number of records.

Controlled. This data policy means that every error is logged, but incorrect records are skipped and data parsing continues. Generally, incorrect records with error information are logged into stdout. Only , and enable to sent them out through the optional second port. See error metadata description for particular above mentioned readers.

In , and , you can configure the validation of your input XML files by specifying the Xml features attribute. The Xml features configure validation of the XML in more detail by enabling or disabling specific checks, see . It is expressed as a sequence of individual expressions of one of the following form: nameM:=true or nameN:=false, where each nameM is an XML feature that should be validated. These expressions are separated from each other by a semicolon.

requires a transformation which can be written in both CTL and Java. For more information about the transformation template, see . Remember that this component allows to send each record through a connected output port whose number equals the value returned by the transformation (). Mapping must be defined for such a port.

requires a transformation which can be written in both CTL and Java. For more information about the interface, see . Remember that this component allows sending of each record through a connected output port whose number equals the value returned by the transformation (). Mapping must be defined for such a port.

requires a transformation which can only be written in Java. For more information, see .

URL File Dialog
URL file dialog
URL file dialog
URL file dialog
Hadoop connection
Setting Client Properties
password@212.93.193.82
password@66.11.122.193
secret_access_key@s3.amazonaws.com
URL File Dialog
FlatFileReader
SpreadsheetDataReader
DatabaseReader
FlatFileReader
MultiLevelReader
SpreadsheetDataReader
XMLExtract
XMLXPathReader
DatabaseReader
LDAPReader
FlatFileReader
JSONReader
SpreadsheetDataReader
ComplexDataReader
DatabaseReader
FlatFileReader
JSONReader
MultiLevelReader
XMLReader
XMLXPathReader
XMLExtract
XMLReader
XMLXPathReader
Parser Features
Supported File URL Formats for Readers
Viewing Data on Readers
Input Port Reading
Incremental Reading
Selecting Input Records
Data Policy
XML Features
CTL Templates for Readers
Java Interfaces for Readers
Input Port Reading
ComplexDataReader
DatabaseReader
DataGenerator
EDIFACTReader
FlatFileReader
JSONExtract
JSONReader
LDAPReader
MultiLevelReader
SpreadsheetDataReader
UniversalDataReader
X12Reader
XMLExtract
XMLReader
XMLXPathReader
DataGenerator
DataGenerator
MultiLevelReader
Java Interfaces for MultiLevelReader
Return Values of Transformations
Data Inspector
Viewing Data on Writers
Output Port Writing
Return Values of Transformations
Return Values of Transformations
CTL Templates for DataGenerator
Java Interface