Primeur Online Docs
Data Shaper
Data Shaper
  • 🚀GETTING STARTED
    • What is Primeur Data Shaper
      • What is the Data Shaper Designer
      • What is the Data Shaper Server
      • What is the Data Shaper Cluster
    • How does the Data Shaper Designer work
      • Designer Views and Graphs
      • Data Shaper Graphs
      • Designer Projects and Sandboxes
      • Data Shaper Designer Reference
    • How do the Data Shaper Server and Cluster work
      • Data Shaper Server and Cluster
      • Data Shaper Server Reference
    • VFS Graph Components
      • DataOneFileDescriptor (DOFD) metadata
      • Passing parameters from Data One Contract to Data Shaper graph
      • Inheriting Data One context attributes in Data Shaper graph
  • DATA SHAPER DESIGNER
    • Configuration
      • Runtime configuration
        • Logging
        • Master Password
        • User Classpath
      • Data Shaper Server Integration
      • Execution monitoring
      • Java configuration
      • Engine configuration
      • Refresh Operation
    • Designer User Interface
      • Graph Editor with Palette of Components
      • Project Explorer Pane
      • Outline Pane
      • Tabs Pane
      • Execution Tab
      • Keyboard Shortcuts
    • Projects
      • Creating Data Shaper projects
      • Converting Data Shaper projects
      • Structure of Data Shaper projects
      • Versioning of server project content
      • Working with Data Shaper Server Projects
      • Project configuration
    • Graphs
      • Creating an empty graph
      • Creating a simple graph
        • Placing Components
        • Placing Components from Palette
        • Connecting Components with Edges
    • Execution
      • Successful Graph Execution
      • Run configuration
      • Connecting to a running job
      • Graph states
    • Common dialogs
      • URL file dialog
      • Edit Value dialog
      • Open Type dialog
    • Import
      • Import Data Shaper projects
      • Import from Data Shaper server sandbox
      • Import graphs
      • Import metadata
    • Export
      • Export graphs to HTML
      • Export to Data Shaper Server sandbox
      • Export image
      • Export Project as Library
    • Graph tracking
      • Changing record count font size
    • Search functionality
    • Working with Data Shaper server
      • Data Shaper server project basic principles
      • Connecting via HTTP
      • Connecting via HTTPS
      • Connecting via Proxy Server
    • Graph components
      • Adding components
      • Finding components
      • Edit component dialog box
      • Enable/disable component
      • Passing data through disabled component
      • Common properties of components
      • Specific attribute types
      • Metadata templates
    • Edges
      • Connecting Components with Edges
      • Types of Edges
      • Assigning Metadata to Edges
      • Colors of Edges
      • Debugging Edges
      • Edge Memory Allocation
    • Metadata
      • Records and Fields
        • Record Types
        • Data Types in Metadata
        • Data Formats
        • Locale and Locale Sensitivity
        • Time Zone
        • Autofilling Functions
      • Metadata Types
        • Internal Metadata
        • External (Shared) Metadata
        • SQL Query Metadata
        • Reading Metadata from Special Sources
      • Auto-propagated Metadata
        • Sources of Auto-Propagated Metadata
        • Explicitly Propagated Metadata
        • Priorities of Metadata
        • Propagation of SQL Query Metadata
      • Creating Metadata
        • Extracting Metadata from a Flat File
        • Extracting Metadata from an XLS(X) File
        • Extracting Metadata from a Database
        • Extracting Metadata from a DBase File
        • Extracting Metadata from Salesforce
        • SQL Query Metadata
        • User Defined Metadata
      • Merging Existing Metadata
      • Creating Database Table from Metadata and Database Connection
      • Metadata Editor
        • Opening Metadata Editor
        • Basics of Metadata Editor
        • Record Pane
        • Field Name vs. Label vs. Description
        • Details Pane
      • Changing and Defining Delimiters
      • Editing Metadata in the Source Code
      • Multi-value Fields
        • Lists and Maps Support in Components
        • Joining on multivalue fields (Comparison Rules)
    • Connections
      • Database Connections
        • Internal Database Connections
        • External (Shared) Database Connections
        • Database Connections Properties
        • Encryption of Access Password
        • Browsing Database and Extracting Metadata from Database Tables
        • Windows Authentication on Microsoft SQL Server
        • Snowflake Connection
        • Hive Connection
        • Troubleshooting
      • JMS Connections
      • QuickBase Connections
      • Hadoop Connections
      • Kafka Connections
      • OAuth2 Connections
      • MongoDB Connections
      • Salesforce Connections
    • Lookup Tables
      • Lookup Tables in Cluster Environment
      • Internal Lookup Tables
      • External (Shared) Lookup Tables
      • Types of Lookup Tables
    • Sequences
      • Persistent Sequences
      • Non Persistent Sequences
      • Internal Sequences
      • External (Shared) Sequences
      • Editing a Sequence
      • Sequences in Cluster Environment
    • Parameters
      • Internal Parameters
      • External (Shared) Parameters
      • Secure Graph Parameters
      • Graph Parameter Editor
      • Secure Graph Parameters
      • Parameters with CTL2 Expressions (Dynamic Parameters)
      • Environment Variables
      • Canonicalizing File Paths
      • Using Parameters
    • Internal/External Graph Elements
    • Dictionary
      • Creating a Dictionary
      • Using a Dictionary in Graphs
    • Execution Properties
    • Notes in Graphs
      • Placing Notes into Graph
      • Resizing Notes
      • Editing Notes
      • Formatted Text
      • Links from Notes
      • Folding Notes
      • Notes Properties
    • Transformations
      • Defining Transformations
      • Transform Editor
      • Common Java Interfaces
    • Data Partitioning (Parallel Running)
    • Data Partitioning in Cluster
      • High Availability
      • Scalability
      • Graph Allocation Examples
      • Example of Distributed Execution
      • Remote Edges
    • Readers
      • Common Properties of Readers
      • ComplexDataReader
      • DatabaseReader
      • DataGenerator
      • DataOneVFSReader
      • EDIFACTReader
      • FlatFileReader
      • JSONExtract
      • JSONReader
      • LDAPReader
      • MultiLevelReader
      • SpreadsheetDataReader
      • UniversalDataReader
      • X12Reader
      • XMLExtract
      • XMLReader
      • XMLXPathReader
    • Writers
      • Common Properties of Writers
      • DatabaseWriter
      • DataOneVFSWriter
      • EDIFACTWriter
      • FlatFileWriter
      • JSONWriter
      • LDAPWriter
      • SpreadsheetDataWriter
      • HIDDEN StructuredDataWriter
      • HIDDEN TableauWriter
      • Trash
      • UniversalDataWriter
      • X12Writer
      • XMLWriter
    • Transformers
      • Common Properties of Transformers
      • Aggregate
      • Concatenate
      • DataIntersection
      • DataSampler
      • Dedup
      • Denormalizer
      • ExtSort
      • FastSort
      • Filter
      • Map
      • Merge
      • MetaPivot
      • Normalizer
      • Partition
      • Pivot
      • Rollup
      • SimpleCopy
      • SimpleGather
      • SortWithinGroups
      • XSLTransformer
    • Joiners
      • Common Properties of Joiners
      • Combine
      • CrossJoin
      • DBJoin
      • ExtHashJoin
      • ExtMergeJoin
      • LookupJoin
      • RelationalJoin
    • Others
      • Common Properties of Others
      • CheckForeignKey
      • DBExecute
      • HTTPConnector
      • LookupTableReaderWriter
      • WebServiceClient
    • CTL2 - Data Shaper Transformation Language
    • Language Reference
      • Program Structure
      • Comments
      • Import
      • Data Types in CTL2
      • Literals
      • Variables
      • Dictionary in CTL2
      • Operators
      • Simple Statement and Block of Statements
      • Control Statements
      • Error Handling
      • Functions
      • Conditional Fail Expression
      • Accessing Data Records and Fields
      • Mapping
      • Parameters
      • Regular Expressions
    • CTL Debugging
      • Debug Perspective
      • Importing and Exporting Breakpoints
      • Inspecting Variables and Expressions
      • Examples
    • Functions Reference
      • Conversion Functions
      • Date Functions
      • Mathematical Functions
      • String Functions
      • Mapping Functions
      • Container Functions
      • Record Functions (Dynamic Field Access)
      • Miscellaneous Functions
      • Lookup Table Functions
      • Sequence Functions
      • Data Service HTTP Library Functions
      • Custom CTL Functions
      • CTL2 Appendix - List of National-specific Characters
      • HIDDEN Subgraph Functions
    • Tutorial
      • Creating a Transformation Graph
      • Filtering the records
      • Sorting the Records
      • Processing Speed-up with Parallelization
      • Debugging the Java Transformation
  • DATA SHAPER SERVER
    • Introduction
    • Administration
      • Monitoring
    • Using Graphs
      • Job Queue
      • Execution History
      • Job Inspector
    • Cluster
      • Sandboxes in Cluster
      • Troubleshooting
  • Install Data Shaper
    • Install Data Shaper
      • Introduction to Data Shaper installation process
      • Planning Data Shaper installation
      • Data Shaper System Requirements
      • Data Shaper Domain Master Configuration reference
      • Performing Data Shaper initial installation and master configuration
        • Creating database objects for PostgreSQL
        • Creating database objects for Oracle
        • Executing Data Shaper installer
        • Configuring additional firewall rules for Data Shaper
Powered by GitBook
On this page
  • Supported File URL Formats for Writers
  • Writing to Local Files
  • Writing to Remote Files
  • Writing to Output Port
  • Using Proxy in Writers
  • Writing to Dictionary
  • Sandbox Resource as Data Source
  • Viewing Data on Writers
  • Output Port Writing
  • Appending or Overwriting
  • Creating Directories
  • Selecting Output Records
  • Partitioning Output into Different Output Files
  • Partitioning Criteria
  • Partitioning according to Data Field Value
  • Partitioning using Lookup Table
  • Filtering Records using Lookup Table
  • Combining of Ways of Partitioning
  • Limits of Partitioning
  • Name for Partitioned File
  • Hash Sign versus Dollar Sign
  • Excluding Fields
  1. DATA SHAPER DESIGNER
  2. Writers

Common Properties of Writers

PreviousWritersNextDatabaseWriter

Last updated 1 month ago

Writers are the final components of a transformation graph. They serve to write data to files located on disk or to send data using some FTP, LDAP or JMS connection, or insert data into database tables. Trash component, which discards all records it receives, is categorized as Writer as it can be set to store records in a debug file.

Each Writer must have at least one input port through which the data flows to this graph component from some of the others.

Writers can either append data to an existing file, sheet or database table, or replace the existing content by new one. For this purpose, Writers writing to files have the Append attribute. This attribute is set to false, by default. That means "do not append data, replace it". Replacing database table is available in some bulkloaders.

You can also write data to one file or one database table by more Writers of the same graph; in such a case you should write data by different Writers in different phases.

Most Writers let you see some part of resulting data. Right-click the Writer and select the View data option. You will be prompted with the same View data dialog as when debugging the edges. For more details, see . This dialog allows you to view the written data. It can only be used after graph has already been run.

Below is a brief overview of links to these options:

  • Below are examples of the File URL attribute for writing to local and remote files, through proxy, output port and dictionary:

  • As it has been shown in , some Writers allow you to define a transformation. For information about transformation interfaces that must be implemented in transformations written in Java.

COMPONENT
DATA OUTPUT
INPUT PORTS
OUTPUT PORTS
TRANSFORMATION
TRANSF. REQUIRED
JAVA
CTL
AUTO-PROPAGATED METADATA

database

1

0-2

x

x

x

x

x

EDIFACT file

1-n

0-1

x

x

x

x

x

flat file

1

0-1

x

x

x

x

x

JSON file

1-n

0-1

x

x

x

x

x

LDAP directory tree

1

0-1

x

x

x

x

x

XLS(X) file

1

0-1

x

x

x

x

x

none

1

0

x

x

x

x

x

flat file

1

0-1

x

x

x

x

x

X12 file

1-n

0-1

x

x

x

x

x

XML file

1-n

0-1

x

x

x

x

x

Supported File URL Formats for Writers

The URL shown below can also contain placeholders – a dollar sign or hash sign.

Dollar and hash signs serve for different purposes.

  • Dollar sign should be used when each of multiple output files contains only a specified number of records based on the Records per file attribute.

  • Hash sign should be used when each of multiple output files only contains records corresponding to the value of specified Partition key. Note: hash signs in URL examples in this section serve to separate a compressed file (zip, gz) from its contents. These are not placeholders. To ensure graph portability, forward slashes must be used when defining the path in URLs (even on Microsoft Windows).

Below are examples of possible URLs for Writers:

Writing to Local Files

  • /path/filename.out Writes specified file on disk.

  • /path1/filename1.out;/path2/filename2.out Writes two specified files on disk.

/path/filename$.out Writes a number of files on disk. The dollar sign represents one digit. Thus, the output files can have the name range from filename0.out to filename9.out. The dollar sign is used when Records per file is set.

  • /path/filename$$.out Writes a number of files on disk. Two dollar signs represent two digits. Thus, the output files can have the name range from filename00.out to filename99.out. The dollar sign is used when Records per file is set.

  • /path/filename#.out Writes a number of files on disk. If Partition file tag is set to Key file tag, the hash sign in the file name is replaced with Partition key field value. Otherwise, the hash sign is replaced with number.

  • zip:(/path/file$.zip) Writes a number of compressed files on disk. The dollar sign represents one digit. Thus, the compressed output files can have the names from file0.zip to file9.zip. The dollar sign is used when Records per file is set.

  • zip:(/path/file$.zip)#innerfolder/filename.out Writes a specified file inside the compressed files on disk. The dollar sign represents one digit. Thus, the compressed output files containing the specified filename.out file can have the name range from file0.zip to file9.zip. The dollar sign is used when Records per file is set.

  • gzip:(/path/file$.gz) Writes a number of compressed files on disk. The dollar sign represents one digit. Thus, the compressed output files can have the name ranges from file0.gz to file9.gz. The dollar sign is used when Records per file is set.

Note: Although Data Shaper can read data from a .tar file, writing to a .tar file is not supported.

Writing to Remote Files

  • ftp://user:password@server/path/filename.out Writes a specified filename.out file on a remote server connected via an FTP protocol using username and password.

  • sftp://user:password@server/path/filename.out Writes a specified filename.out file on a remote server connected via an SFTP protocol using a username and password. If a certificate-based authentication is used, certificates are placed in the ${PROJECT}/ssh-keys/ directory. For more information, see SFTP Certificate in Data Shaper. Note, that only certificates without a password are currently supported. The certificate-based authentication has a URL without a password: sftp://username@server/path/filename.txt

  • zip:(ftp://username:password@server/path/file.zip)#innerfolder/filename.txt Writes a specified filename.txt file compressed in the file.zip file on a remote server connected via an FTP protocol using username and password.

  • zip:(ftp://username:password@server/path/file.zip)#innerfolder/filename.txt Writes a specified filename.txt file compressed in the file.zip file on a remote server connected via an FTP protocol.

  • zip:(zip:(ftp://username:password@server/path/name.zip)#innerfolder/file.zip)#innermostfolder/filename.txt Writes a specified filename.txt file compressed in a file.zip file that is also compressed in a name.zip file on a remote server connected via an FTP protocol using username and password.

  • gzip:(ftp://username:password@server/path/file.gz) Writes the first file compressed in a file.gz file on a remote server connected via an FTP protocol.

  • smb2://domain%3Buser:password@server/path/filename.txt Writes a file to a Windows share (Microsoft SMB version 2 and 3). The SMB version 2 and 3 protocol is implemented in the SMBJ library.

Writing to Output Port

  • port:$0.FieldName:discrete If this URL is used, the output port of the Writer must be connected to another component. Output metadata must contain a FieldName of one of the following data types: string, byte or cbyte. Each data record that is received by the Writer through the input port is processed according to the input metadata, sent out through the optional output port, and written as the value of the specified field of the metadata of the output edge. Next records are parsed in the same way as described here.

Using Proxy in Writers

  • http:(direct:)//seznam.cz Without proxy.

  • ftp:(proxy://user:password@proxyserver:1234)//seznam.cz Proxy setting for ftp protocol.

  • ftp:(proxy://proxyserver:443)//server/path/file.dat Proxy setting for FTP protocol.

  • sftp:(proxy://66.11.122.193:443)//user:password@server/path/file.dat Proxy setting for SFTP protocol.

Writing to Dictionary

  • dict:keyName:source Writes data to a file URL specified in dictionary. Target file URL is retrieved from the specified dictionary entry.

  • dict:keyName:discrete [1] Writes data to dictionary. Creates ArrayList<byte[]>

  • dict:keyName:stream [2] Writes data to dictionary. Creates WritableByteChannel

Sandbox Resource as Data Source

A sandbox resource, whether it is a shared, local or partitioned sandbox, is specified in the graph under the fileURL attributes as a so called sandbox URL like:

sandbox://data/path/to/file/file.dat

where data is a code for sandbox and path/to/file/file.dat is the path to the resource from the sandbox root. The URL is evaluated by Data Shaper Server during graph execution and a component (Reader or Writer) obtains the opened stream from the Server. This may be a stream to a local file or to some other remote resource. Thus, a graph does not have to run on the node which has local access to the resource. There may be more sandbox resources used in the graph and each of them may be on a different node. In such cases, Data Shaper Server would choose the node with the most local resources to minimize remote streams.

The sandbox URL has a specific use for parallel data processing. When the sandbox URL with the resource in a partitioned sandbox is used, that part of graph/phase runs in parallel, according to the node allocation specified by the list of partitioned sandbox locations. Thus, each worker has its own local sandbox resource. Data Shaper Server evaluates the sandbox URL on each worker and provides an open stream to a local resource to the component.

[1] The discrete processing type uses a byte array for storing data.

[2] The stream processing type uses an output stream that must be created before running a graph (from Java code).

Viewing Data on Writers

After an output file has been created, you can view its data on Writers using the context menu. Simply right-click the desired component, and select Inspect data from the context menu.

Output Port Writing

Some Writers allow you to write data to the optional output port.

Below is the list of Writers allowing output port writing:

Set the File URL attribute of the Writer to port:$0.FieldName[:processingType].

Here, processingType is optional and can be set to one of the following: discrete or stream. If it is not set explicitly, it is discrete by default.

  • discrete The file content is stored into a field (of one record). The data should be small enough to fit into this one field. If the data is partitioned into multiple files, multiple output records are sent out. Each output record contains input data of one partition.

  • stream The file content is written to a stream, which is split into chunks. The chunks are written into a user-specified output field. One chunk goes to one output record, therefore your data does not have to fit into a single data field. The stream is terminated with another record with null in the field (as a sentinel). If the data is partitioned into multiple files, null also serves as a delimiter between the files. The count of output records depends on the value of the PortReadingWriting.DATA_LENGTH parameter. The default value is 2,048 B.

If you connect the optional output port of any Writer with an edge to another component, metadata of the edge must contain the specified FieldName of a string, byte or <code<cbyte data type.

When a graph runs, data is read through the input according to the input metadata, processed by the Writer according to the specified processing type and sent subsequently to the other component through the optional output port of the Writer.

Appending or Overwriting

If the target file exists, there are two options:

  1. the existing file can be replaced;

  2. the records can be appended to the existing content.

Appending or replacing is configured with the Append attribute.

If Append is set to true, records are appended to the file.

If Append is set to false, the file is overwritten. The default value is false.

You can also append data to files in local (non-remote) zip archives. In server environment, this means [use_local_context_url] has to be set to true.

Append is available in the following Writers:

Creating Directories

If you specify a non-existing directory in the File URL, set the Create directories attribute to true. The directory will be created. Otherwise, the graph would fail.

The default value of Create directories is false.

The Create directories attribute is available in the following Writers:

Selecting Output Records

The limit on the number of written records is set up with Max number of records.

The number of records to be skipped is set up with Number of skipped records.

The following components let you set up Max number of records or Number of skipped records:

Partitioning Output into Different Output Files

Some Writers let you part the incoming data flow and distribute the records among different output files. The components are:

Partitioning Criteria

You can part data according to the number of records or classified according to values of specified fields.

Partitioning by Number of Records

Partitioning by number of records saves at most N records into one file. The other records are saved into another file until the limit is reached and so forth. Use Records per file attribute to set up the limit N.

Example: part 450 record into output files. Each output file has at most 100 record.

Solution: File URL value should contain $ sign(s). The $ signs will be replaced with digits.

ATTRIBUTE
VALUE

File URL

${DATAOUT_DIR}/output_$$.txt

Records per file

100

Partitioning according to Data Field Value

Records can be parted into multiple output files according to a data field value. The field is specified with the Partition key attribute.

The placeholder # in output file name can be replaced with a field value or with integer. If Partition file tag is set to Number file tag, the placeholder is replaced with integer. If Partition file tag is set to Key file tag, the placeholder is replaced with a field value. The default value is Number file tag.

The partition key consists of a list of fields forming the partition key. The list has the form of a sequence of incoming record field names separated by a semicolon.

Example: part data according to the field1 field. Use the field value as a part of output file name.

ATTRIBUTE
VALUE

File URL

${DATAOUT_DIR}/output_#.txt

Partition key

field1

Partition file tag

Key file tag

If you use two or more fields for partitioning, use the placeholder # on one place in the file URL: ${DATAOUT_DIR}/output_#.txt. Do not use the placeholder for each key field.

Partitioning using Lookup Table

Partitioning using a lookup table lets you part records using input field values. The values of Partition key serve as a key to be looked up in the lookup table. A value corresponding to the key defines a group.

A group can form its name with a number or value from a lookup table.

Each group is written to its own output file.

The difference between partitioning according to a data field value, and partitioning using a lookup table is that in the first case, one unique Partition key value creates one group, whereas in the latter one, a single group can correspond to multiple different Partition key values.

Example: input data contain the field city as well as other fields. The lookup table contains city and country. Part data into files: each file should contain records corresponding to one country. Records with unmatched cities should have unmatched instead of the country.

ATTRIBUTE
VALUE

File URL

${DATAOUT_DIR}/output_#.txt

Partition key

field1

Partition lookup table

TheLookupTable

Partition file tag

Key file tag

Partition output fields

country

Partition unassigned file name

unmatched

Remember that if all incoming records are assigned to the values of lookup table, the file for unassigned records will be empty (even if it is defined).

Filtering Records using Lookup Table

You can use partitioning using a lookup table to write a subset of input records. For example, you can only write records corresponding to some countries (from previous example). To constrain the records, define values of desired fields in lookup table (key fields) and leave Partition unassigned file name blank.

Combining of Ways of Partitioning

You can combine partitioning by number of records and partitioning according to data field value.

Example: part data according to the field1 field. Use the field value as a part of output file name. Write at most 100 records into one file.

ATTRIBUTE
VALUE

File URL

${DATAOUT_DIR}/output_#_$.txt

Records per file

100

Partition key

field1

Partition file tag

Key file tag

The # sign is replaced with a field1 value. The $ sign is replaced with integer according to number of record with same field1 value.

Limits of Partitioning

The partitioning algorithm keeps all output files open at once. This could lead to an undesirable memory footprint for many output files (thousands). Moreover, for example unix-based OS usually have very strict limitation of number of simultaneously open files (1,024) per process.

In case you run into one of these limitations, consider sorting the data according to the partition key using one of our standard sorting components and set the Sorted input attribute to true. The partitioning algorithm does not need to keep open all output files, just the last one is open at one time.

Name for Partitioned File

The File URL value only serves as a base name for the output file names. The base name should contain placeholders - dollar sign or hash sign.

The dollar sign is replaced with a number. If you use more dollar signs, each $ is replaced with one digit. This way, leading zeros can be inserted. Use $ if you part according to number of records.

The hash sign is replaced with a number, field value, or value from a lookup table. Leading zeros can be created with more hash signs. Use # if you part according to field value or using lookup table.

Hash Sign versus Dollar Sign

You should differentiate between hash sign and dollar sign usage.

  • Hash sign A hash sign should be used when each of multiple output files only contains records corresponding to the value of specified Partition key.

  • Dollar sign A dollar sign should be used when each of multiple output files contains only a specified number of records based on the Records per file attribute.

The hash(es) can be inserted in any place of this file part of File URL, even in the middle. For example: path/output#.xls (in the case of the output XLS file).

If Partition file tag is set to Number file tag, output files are numbered and the count of hashes used in File URL means the count of digits for these distinguishing numbers. This is the default value of Partition file tag. Thus, ### can go from 000 to 999.

If Partition file tag is set to Key file tag, a single hash must be used in File URL at most. Distinguishing names are used.

These distinguishing names will be created as follows:

If the Partition key attribute (or the Partition output fields attribute) is of the following form: field1;field2;…​;fieldN and the values of these fields are the following: valueofthefield1, valueofthefield2, …​, valueofthefieldN, all the values of the fields are converted to strings and concatenated. The resulting strings will have the following form: valueofthefield1valueofthefield2…​valueofthefieldN. Such resulting strings are used as distinguishing names and each of them is inserted to the File URL into the place marked with hash, or appended to the end of File URL if no hash is used in File URL.

For example, if firstname;lastname is the Partition key (or Partition output fields), you can have the output files as follows:

  • path/outjohnsmith.xls, path/outmarksmith.xls, path/outmichaelgordon.xls, etc. (if File URL is path/out#.xls and Partition file tag is set to Key file tag).

  • Or path/out01.xls, path/out02.xls. etc. (if File URL is path/out##.xls and Partition file tag is set to Number file tag).

Excluding Fields

Some components without output mapping let you omit particular fields from results. Use the Exclude fields attribute and specify metadata fields that should not be written to the output. It has a form of a sequence of field names separated by a semicolon. The field names can be typed manually of created using a key dialog.

Excluding fields attribute is available in:

If you part data and Partition file tag is set to Key file tag, values of Partition key form the names of output files, and the values are written to the corresponding files as well. To avoid saving the same information twice, you can select the fields that will be excluded from writing.

Use the Exclude fields attribute to specify fields that should not be written into output files. The fields will only be a part of file or sheet names, but will not be written to the contents of these files.

When you read these files back, you can acquire the values with an autofilling function source_name.

Example: when you have files created using Partition key set to City and the output files are London.txt, Stockholm.txt, etc., you can get these values (London, Stockholm, etc.) from the file names. The City field values do not need to be contained in the files.

Note: If you want to use the value of a field as the path to an existing file, type the following as the File URL attribute in Writer: //# This way, if the value of the field used for partitioning is path/to/my/file/filename.txt, it will be assigned to the output file as its name. For this reason, the output file will be located in path/to/my/file and its name will be filename.txt.

The File URL attribute lets you type in the file URL directly, or open the .

Writes a specified filename.out file on a remote server connected via a WebDAV protocol using username and password.

s3://access_key_id:/bucketname/path/filename.out Writes to path/filename.out object located in the Amazon S3 web storage service in a bucket bucketname using an access key ID and secret access key. See. It is recommended to connect to S3 via a _region-specific_ S3 URL: s3://s3.eu-central-1.amazonaws.com/bucket.name/. A region-specific URL have much better performance than a generic one (s3://s3.amazonaws.com/bucket.name/). See recommendation on the .

az-blob://account:/containername/path/filename.txt Writes to path/filename.out object located in the Azure Blob Storage service in the specified container. Connects using the specified Account Key. See for other authentication options.

hdfs://CONN_ID/path/filename.dat Writes a file on a Hadoop distributed file system (HDFS). To which HDFS NameNode to connect to is defined in a with CONN_ID. This example file URL writes a file with /path/filename.dat absolute HDFS path.

smb://domain%3Buser:password@server/path/filename.txt Writes a file to a Windows share (Microsoft SMB version 1/CIFS protocol). The server part may be a DNS name, an IP address or a NetBIOS name. The Userinfo part of the URL (domain%3Buser:password) is not mandatory and any URL reserved character it contains should be escaped using the %-encoding similarly to the semicolon ; character with %3B in the example (the semicolon is escaped because it collides with the default Data Shaper file URL separator). Also note that the dollar sign $ in the URL path (e.g. in the case of writing to an Administrative share) is reserved for the file partitioning feature so it too needs be escaped (with %24). The SMB protocol is implemented in the JCIFS library which may be configured using Java system properties. For a list of all configurable properties, see in JCIFS documentation.

http:(proxy://user::443)//seznam.cz Proxy setting for HTTP protocol.

s3:(proxy://user::443)//access_key_id:/bucketname/path/filename.out Proxy setting for S3 protocol.

See also

See .

The same can be done in some of the Readers. See .

The attributes for the output port writing in these components may be defined using the .

(the Debug append attribute)

Writers let you limit the records that should be written. You can limit the number of records to be written and skip the specified number of records. If you need to apply a filter on output records, use before the Writer.

(Number of skipped records only)

URL file dialog
http://username:password@server/filename.out
secret_access_key@s3.amazonaws.com
Amazon S3 URL
Amazon S3 URL
account_key@account.blob.core.windows.net
Azure Blob Storage
Hadoop connection
Setting Client Properties
password@212.93.193.82
password@66.11.122.193
secret_access_key@s3.amazonaws.com
FlatFileWriter
JSONWriter
SpreadsheetDataWriter
XMLWriter
URL file dialog
FlatFileWriter
Trash
XMLWriter
FlatFileWriter
JSONWriter
SpreadsheetDataWriter
Trash
XMLWriter
Filter
EDIFACTWriter
FlatFileWriter
JSONWriter
SpreadsheetDataWriter
X12Writer
XMLWriter
EDIFACTWriter
FlatFileWriter
JSONWriter
SpreadsheetDataWriter
X12Writer
XMLWriter
FlatFileWriter
Defining Transformations
Viewing Debug Data
Supported File URL Formats for Writers
Viewing Data on Writers
Output Port Writing
Appending or Overwriting
Creating Directories
Excluding Fields
Selecting Output Records
Partitioning Output into Different Output Files
DatabaseWriter
EDIFACTWriter
FlatFileWriter
JSONWriter
LDAPWriter
SpreadsheetDataWriter
Trash
UniversalDataWriter
X12Writer
XMLWriter
URL file dialog
Data Inspector
Supported File URL Formats for Readers
Viewing Data on Readers