Primeur Online Docs
Data Shaper
Data Shaper
  • 🚀GETTING STARTED
    • What is Primeur Data Shaper
      • What is the Data Shaper Designer
      • What is the Data Shaper Server
      • What is the Data Shaper Cluster
    • How does the Data Shaper Designer work
      • Designer Views and Graphs
      • Data Shaper Graphs
      • Designer Projects and Sandboxes
      • Data Shaper Designer Reference
    • How do the Data Shaper Server and Cluster work
      • Data Shaper Server and Cluster
      • Data Shaper Server Reference
    • VFS Graph Components
      • DataOneFileDescriptor (DOFD) metadata
      • Passing parameters from Data One Contract to Data Shaper graph
      • Inheriting Data One context attributes in Data Shaper graph
  • DATA SHAPER DESIGNER
    • Configuration
      • Runtime configuration
        • Logging
        • Master Password
        • User Classpath
      • Data Shaper Server Integration
      • Execution monitoring
      • Java configuration
      • Engine configuration
      • Refresh Operation
    • Designer User Interface
      • Graph Editor with Palette of Components
      • Project Explorer Pane
      • Outline Pane
      • Tabs Pane
      • Execution Tab
      • Keyboard Shortcuts
    • Projects
      • Creating Data Shaper projects
      • Converting Data Shaper projects
      • Structure of Data Shaper projects
      • Versioning of server project content
      • Working with Data Shaper Server Projects
      • Project configuration
    • Graphs
      • Creating an empty graph
      • Creating a simple graph
        • Placing Components
        • Placing Components from Palette
        • Connecting Components with Edges
    • Execution
      • Successful Graph Execution
      • Run configuration
      • Connecting to a running job
      • Graph states
    • Common dialogs
      • URL file dialog
      • Edit Value dialog
      • Open Type dialog
    • Import
      • Import Data Shaper projects
      • Import from Data Shaper server sandbox
      • Import graphs
      • Import metadata
    • Export
      • Export graphs to HTML
      • Export to Data Shaper Server sandbox
      • Export image
      • Export Project as Library
    • Graph tracking
      • Changing record count font size
    • Search functionality
    • Working with Data Shaper server
      • Data Shaper server project basic principles
      • Connecting via HTTP
      • Connecting via HTTPS
      • Connecting via Proxy Server
    • Graph components
      • Adding components
      • Finding components
      • Edit component dialog box
      • Enable/disable component
      • Passing data through disabled component
      • Common properties of components
      • Specific attribute types
      • Metadata templates
    • Edges
      • Connecting Components with Edges
      • Types of Edges
      • Assigning Metadata to Edges
      • Colors of Edges
      • Debugging Edges
      • Edge Memory Allocation
    • Metadata
      • Records and Fields
        • Record Types
        • Data Types in Metadata
        • Data Formats
        • Locale and Locale Sensitivity
        • Time Zone
        • Autofilling Functions
      • Metadata Types
        • Internal Metadata
        • External (Shared) Metadata
        • SQL Query Metadata
        • Reading Metadata from Special Sources
      • Auto-propagated Metadata
        • Sources of Auto-Propagated Metadata
        • Explicitly Propagated Metadata
        • Priorities of Metadata
        • Propagation of SQL Query Metadata
      • Creating Metadata
        • Extracting Metadata from a Flat File
        • Extracting Metadata from an XLS(X) File
        • Extracting Metadata from a Database
        • Extracting Metadata from a DBase File
        • Extracting Metadata from Salesforce
        • SQL Query Metadata
        • User Defined Metadata
      • Merging Existing Metadata
      • Creating Database Table from Metadata and Database Connection
      • Metadata Editor
        • Opening Metadata Editor
        • Basics of Metadata Editor
        • Record Pane
        • Field Name vs. Label vs. Description
        • Details Pane
      • Changing and Defining Delimiters
      • Editing Metadata in the Source Code
      • Multi-value Fields
        • Lists and Maps Support in Components
        • Joining on multivalue fields (Comparison Rules)
    • Connections
      • Database Connections
        • Internal Database Connections
        • External (Shared) Database Connections
        • Database Connections Properties
        • Encryption of Access Password
        • Browsing Database and Extracting Metadata from Database Tables
        • Windows Authentication on Microsoft SQL Server
        • Snowflake Connection
        • Hive Connection
        • Troubleshooting
      • JMS Connections
      • QuickBase Connections
      • Hadoop Connections
      • Kafka Connections
      • OAuth2 Connections
      • MongoDB Connections
      • Salesforce Connections
    • Lookup Tables
      • Lookup Tables in Cluster Environment
      • Internal Lookup Tables
      • External (Shared) Lookup Tables
      • Types of Lookup Tables
    • Sequences
      • Persistent Sequences
      • Non Persistent Sequences
      • Internal Sequences
      • External (Shared) Sequences
      • Editing a Sequence
      • Sequences in Cluster Environment
    • Parameters
      • Internal Parameters
      • External (Shared) Parameters
      • Secure Graph Parameters
      • Graph Parameter Editor
      • Secure Graph Parameters
      • Parameters with CTL2 Expressions (Dynamic Parameters)
      • Environment Variables
      • Canonicalizing File Paths
      • Using Parameters
    • Internal/External Graph Elements
    • Dictionary
      • Creating a Dictionary
      • Using a Dictionary in Graphs
    • Execution Properties
    • Notes in Graphs
      • Placing Notes into Graph
      • Resizing Notes
      • Editing Notes
      • Formatted Text
      • Links from Notes
      • Folding Notes
      • Notes Properties
    • Transformations
      • Defining Transformations
      • Transform Editor
      • Common Java Interfaces
    • Data Partitioning (Parallel Running)
    • Data Partitioning in Cluster
      • High Availability
      • Scalability
      • Graph Allocation Examples
      • Example of Distributed Execution
      • Remote Edges
    • Readers
      • Common Properties of Readers
      • ComplexDataReader
      • DatabaseReader
      • DataGenerator
      • DataOneVFSReader
      • EDIFACTReader
      • FlatFileReader
      • JSONExtract
      • JSONReader
      • LDAPReader
      • MultiLevelReader
      • SpreadsheetDataReader
      • UniversalDataReader
      • X12Reader
      • XMLExtract
      • XMLReader
      • XMLXPathReader
    • Writers
      • Common Properties of Writers
      • DatabaseWriter
      • DataOneVFSWriter
      • EDIFACTWriter
      • FlatFileWriter
      • JSONWriter
      • LDAPWriter
      • SpreadsheetDataWriter
      • HIDDEN StructuredDataWriter
      • HIDDEN TableauWriter
      • Trash
      • UniversalDataWriter
      • X12Writer
      • XMLWriter
    • Transformers
      • Common Properties of Transformers
      • Aggregate
      • Concatenate
      • DataIntersection
      • DataSampler
      • Dedup
      • Denormalizer
      • ExtSort
      • FastSort
      • Filter
      • Map
      • Merge
      • MetaPivot
      • Normalizer
      • Partition
      • Pivot
      • Rollup
      • SimpleCopy
      • SimpleGather
      • SortWithinGroups
      • XSLTransformer
    • Joiners
      • Common Properties of Joiners
      • Combine
      • CrossJoin
      • DBJoin
      • ExtHashJoin
      • ExtMergeJoin
      • LookupJoin
      • RelationalJoin
    • Others
      • Common Properties of Others
      • CheckForeignKey
      • DBExecute
      • HTTPConnector
      • LookupTableReaderWriter
      • WebServiceClient
    • CTL2 - Data Shaper Transformation Language
    • Language Reference
      • Program Structure
      • Comments
      • Import
      • Data Types in CTL2
      • Literals
      • Variables
      • Dictionary in CTL2
      • Operators
      • Simple Statement and Block of Statements
      • Control Statements
      • Error Handling
      • Functions
      • Conditional Fail Expression
      • Accessing Data Records and Fields
      • Mapping
      • Parameters
      • Regular Expressions
    • CTL Debugging
      • Debug Perspective
      • Importing and Exporting Breakpoints
      • Inspecting Variables and Expressions
      • Examples
    • Functions Reference
      • Conversion Functions
      • Date Functions
      • Mathematical Functions
      • String Functions
      • Mapping Functions
      • Container Functions
      • Record Functions (Dynamic Field Access)
      • Miscellaneous Functions
      • Lookup Table Functions
      • Sequence Functions
      • Data Service HTTP Library Functions
      • Custom CTL Functions
      • CTL2 Appendix - List of National-specific Characters
      • HIDDEN Subgraph Functions
    • Tutorial
      • Creating a Transformation Graph
      • Filtering the records
      • Sorting the Records
      • Processing Speed-up with Parallelization
      • Debugging the Java Transformation
  • DATA SHAPER SERVER
    • Introduction
    • Administration
      • Monitoring
    • Using Graphs
      • Job Queue
      • Execution History
      • Job Inspector
    • Cluster
      • Sandboxes in Cluster
      • Troubleshooting
  • Install Data Shaper
    • Install Data Shaper
      • Introduction to Data Shaper installation process
      • Planning Data Shaper installation
      • Data Shaper System Requirements
      • Data Shaper Domain Master Configuration reference
      • Performing Data Shaper initial installation and master configuration
        • Creating database objects for PostgreSQL
        • Creating database objects for Oracle
        • Executing Data Shaper installer
        • Configuring additional firewall rules for Data Shaper
Powered by GitBook
On this page
  • Short Description
  • Ports
  • Metadata
  • FlatFileReader Attributes
  • Details
  • Trimming Data
  • Data Parsers
  • Examples
  • Processing files with headers
  • Handling typist’s error when creating the input file manually
  • Incremental reading
  • Best Practices
  • Troubleshooting
  • See also
  1. DATA SHAPER DESIGNER
  2. Readers

FlatFileReader

PreviousEDIFACTReaderNextJSONExtract

Last updated 1 month ago

Short Description

FlatFileReader reads data from flat files, such as CSV (comma-separated values) file and delimited, fixed-length, or mixed text files. The component can read a single file as well as a collection of files placed on a local disk or remotely. Remote files are accessible via HTTP, HTTPS, FTP, or SFTP protocols. Using this component, ZIP and TAR archives of flat files can be read. Also reading data from an input port, or dictionary is supported. FlatFileReader has an alias - .

COMPONENT
DATA SOURCE
INPUT PORTS
OUTPUT PORTS
EACH TO ALL INPUTS
DIFFERENT TO DIFFERENT OUTPUTS
TRANSFORMATION
TRANSF. REQ.
JAVA
CTL
AUTO-PROPAGATED METADATA

FlatFileReader

flat file

0-1

1-2

x

✓

x

x

x

x

✓

Ports

PORT TYPE
NUMBER
REQUIRED
DESCRIPTION
METADATA

Input

0

x

For Input Port Reading.

include specific byte/ cbyte/ string field

Output

0

✓

For correct data records

Any

1

x

For incorrect data records

specific structure, see table below

Metadata

FlatFileReader does not propagate metadata. This component has metadata templates available. The optional logging port for incorrect records has to define the following metadata structure - the record contains exactly five fields (named arbitrarily) of given types in the following order:

FIELD NUMBER
FIELD NAME
DATA TYPE
DESCRIPTION

0

recordNo

long

The position of an erroneous record in the dataset (record numbering starts at 1).

1

fieldNo

integer

The position of an erroneous field in the record (1 stands for the first field, i.e. that of index 0).

2

originalData

string | byte | cbyte

An erroneous record in raw form (including all field and record delimiters).

3

errorMessage

string | byte | cbyte

A source file in which the error occurred.

4

fileURL

string

A source file in which the error occurred.

Metadata on output port 0 can use . The source_timestamp and source_size functions work only when reading from a file directly (if the file is an archive or it is stored at a remote location, timestamp will be empty and size will be 0).

FlatFileReader Attributes

ATTRIBUTE
REQ.
DESCRIPTION
POSSIBLE VALUES

Basic

File URL

✓

Charset

Character encoding of input records (character encoding does not apply to byte fields, if the record type is fixed). The default encoding depends on DEFAULT_CHARSET_DECODER in defaultProperties.

UTF-8 |

Data policy

Specifies handling of misformatted or incorrect data, see Data Policy.

strict (default) | controlled | lenient

Trim strings

default | true | false

Quoted strings

false | true

Quote character

Specifies which kind of quotes will be permitted in Quoted strings. By default, the value of this attribute is inherited from metadata on output port 0. See also Record Details.

both | " | '

Advanced

Skip leading blanks

default | true | false

Skip trailing blanks

default | true | false

Number of skipped records

0 (default) - N

Max number of records

1 - N

Number of skipped records per source

0 (default)- N

Max number of records per source

1 - N

Missing column handling

Specifies handling of records with fewer fields (columns) than required by metadata. If set to Treat as error, record is considered to be invalid, and the Data Policy setting is applied. If set to Fill with null, record is accepted, and missing fields are set to null, or to the default value if specified.

Treat as error (default) | Fill with null

Max error count

The maximum number of tolerated error records in input file(s); applicable only if Controlled Data Policy is set.

0 (default) - N

Treat multiple delimiters as one

If a field is delimited by a multiplied delimiter character, it will be interpreted as a single delimiter when setting to true.

false (default) | true

Incremental file

[1]

Incremental key

[1]

Verbose

By default, a less comprehensive error notification is provided and the performance is slightly higher; however, if switched to true, more detailed information with less performance is provided.

false (default) | true

Parser

auto (default) |

[1] Either both or neither of these attributes must be specified.

Details

Trimming Data

  1. Input strings are implicitly (i.e. the Trim strings attribute kept at the default value) processed before converting to a value according to the field data type as follows:

    • Whitespace is removed from both the start and the end in the case of boolean, date, decimal, integer, long, or number.

    • Input string is set to a field including a leading and trailing whitespace in the case of byte, cbyte, or string.

  2. If the Trim strings attribute is set to true, all leading and trailing whitespace characters are removed. A field composed of only whitespaces is transformed to null (zero length string). The false value implies preserving all leading and trailing whitespace characters. Remember that input string representing a numerical data type or boolean can not be parsed including whitespace. Thus, use the false value carefully.

  3. Both the Skip leading blanks and Skip trailing blanks attributes have a higher priority than Trim strings. So, the input strings trimming will be determined by the true or false values of these attributes, regardless the Trim strings value.

Data Parsers

  1. org.jetel.data.parser.SimpleDataParser - is a very simple but fast parser with limited validation, error handling and functionality. The following attributes are not supported:

    • Trim strings

    • Skip leading blanks

    • Skip trailing blanks

    • Incremental reading

    • Number of skipped records

    • Max number of records

    • Quoted strings

    • Missing column handling

    • Treat multiple delimiters as one

    • Skip rows

    • Verbose

On top of that, you cannot use metadata containing at least one field with one of these attributes:

  • the field is fixed-length

  • the field has no delimiter or, on the other hand, more of them

  • Shift is not null (see Details Pane)

  • Autofilling set to true

  • the field is byte-based

  1. org.jetel.data.parser.DataParser - an all-round parser working with any reader settings

  2. org.jetel.data.parser.FixLenByteDataParser - used for metadata with byte-based fields only. It parses sequences of records consisting of a fixed number of bytes.

Note: Choosing org.jetel.data.parser.SimpleDataParser while using Quoted strings will cause the Quoted strings attribute to be ignored.

Tips & Tricks

Examples

Processing files with headers

If the first rows of your input file do not represent real data but field labels instead, set the Number of skipped records attribute. If a collection of input files with headers is read, set the Number of skipped records per source.

Handling typist’s error when creating the input file manually

If you wish to ignore accidental errors in delimiters (such as two semicolons instead of a single one as defined in metadata when the input file is typed manually), set the Treat multiple delimiters as one attribute to true. All redundant delimiter characters will be replaced by the proper one.

Incremental reading

Incremental Reading allows you to read only new records from a file. This can be done by setting the Incremental key and Incremental file attributes.

Let us have a list of customers in a file customers.dat. Each record in the file consists of a date, first name and last name:

2018-02-01 23:58:02|Rocky|Whitson 2018-02-01 23:59:56|Marisa|Callaghan 2018-03-01 00:03:12|Yaeko|Gonzale 2018-03-01 00:15:41|Jeana|Rabine 2018-03-01 00:32:22|Daniele|Hagey

Read the file, then add a new record and run the graph again reading only the new record.

Solution In the output metadata, create the date, firstName and lastName fields. Set their data types to date, string and string, respectively. Use the Incremental key and Incremental file attributes.

Note: If the first row of the input file is not a header, remember to set the Number of skipped records attribute to 0.

ATTRIBUTE
VALUE

Incremental key

date

Incremental file

${DATATMP_DIR}/customers_inc_key

After the first read, the output file contains five records.

Now, add a new record to the file, for example: 2018-03-01 00:51:31|Nathalie|Mangram and run the graph again.

This time, only the new record is written to the output file, ignoring the previously processed records.

Best Practices

We recommend users to explicitly specify encoding of the input file (with the Charset attribute). It ensures better portability of the graph across systems with different default encoding. The recommended encoding is UTF-8.

Troubleshooting

With default charset (UTF-8), FlatFileReader cannot parse csv files with binary data. To parse csv files with binary data, change Charset attribute.

See also

A path to a data source (flat file, input port, dictionary) to be read, see .

Specifies whether a leading and trailing whitespace should be removed from strings before setting them to data fields, see below.

Fields containing a special character (comma, newline or double quote) have to be enclosed in quotes. Only a single/double quote is accepted as the quote character. If true, special characters are removed when read by the component (they are not treated as delimiters). Example: To read input data "25"|"John", switch Quoted strings to true and set Quote character to ". This will produce two fields: 25|John. By default, the value of this attribute is inherited from metadata on output port 0. See also .

Specifies whether to skip a leading whitespace (e.g. blanks) before setting input strings to data fields. If not explicitly set (i.e. having the default value), the value of the Trim strings attribute is used. See below.

Specifies whether to skip a trailing whitespace (e.g. blanks) before setting input strings to data fields. If not explicitly set (i.e. having the default value), the value of the Trim strings attribute is used. See below.

The number of records/rows to be skipped from the source file(s). See .

The number of records to be read from the source file(s) in turn; all records are read by default. See .

The number of records/rows to be skipped from each source file. By default, the value of the Skip source rows record property in output port 0 metadata is used. In case the value in metadata differs from the value of this attribute, the Number of skipped records per source value is applied, having a higher priority. See .

The number of records/rows to be read from each source file; all records from each file are read by default. See .

The name of a file storing the incremental key, including path. See .

The variable storing a position of the last read record. See .

By default, the most appropriate parser is applied. Besides, the parser for processing data may be set explicitly. If an improper one is set, an exception is thrown and the graph fails. See below.

Parsed data records are sent to the first output port. The component has an optional output logging port for getting detailed information about incorrect records. Only if is set to controlled and a proper Writer (Trash or FlatFileWriter) is connected to port 1, all incorrect records together with the information about the incorrect value, its location and the error message are sent out through this error port.

org.jetel.data.parser.CharByteDataParser - can be used whenever metadata contain byte-based fields mixed with char-based ones. A byte-based field is a field of one of these types: byte, cbyte or any other field whose format property starts with the "BINARY:" prefix. See .

Handling records with large data fields: FlatFileReader can process input strings of even hundreds or thousands of characters when you adjust the field and record buffer sizes. Just increase the following properties according to your needs: Record.MAX_RECORD_SIZE for record serialization, DataParser.FIELD_BUFFER_LENGTH for parsing and DataFormatter.FIELD_BUFFER_LENGTH for formatting. Finally, don’t forget to increase the DEFAULT_INTERNAL_IO_BUFFER_SIZE variable to be at least 2* MAX_RECORD_SIZE. For information on how to change these property variables, see .

UniversalDataReader
Autofilling Functions
Engine Configuration
FlatFileWriter
UniversalDataReader
Common Properties of Components
Specific Attribute Types
Common Properties of Readers
Record Details
Trimming Data
Trimming Data
Trimming Data
Incremental Reading
Incremental Reading
Data Parsers
Numeric Format > Binary Formats
Data Policy
Supported File URL Formats for Readers
Selecting Input Records
Selecting Input Records
Selecting Input Records
Selecting Input Records