Primeur Online Docs
Data Shaper
Data Shaper
  • 🚀GETTING STARTED
    • What is Primeur Data Shaper
      • What is the Data Shaper Designer
      • What is the Data Shaper Server
      • What is the Data Shaper Cluster
    • How does the Data Shaper Designer work
      • Designer Views and Graphs
      • Data Shaper Graphs
      • Designer Projects and Sandboxes
      • Data Shaper Designer Reference
    • How do the Data Shaper Server and Cluster work
      • Data Shaper Server and Cluster
      • Data Shaper Server Reference
    • VFS Graph Components
      • DataOneFileDescriptor (DOFD) metadata
      • Passing parameters from Data One Contract to Data Shaper graph
      • Inheriting Data One context attributes in Data Shaper graph
  • DATA SHAPER DESIGNER
    • Configuration
      • Runtime configuration
        • Logging
        • Master Password
        • User Classpath
      • Data Shaper Server Integration
      • Execution monitoring
      • Java configuration
      • Engine configuration
      • Refresh Operation
    • Designer User Interface
      • Graph Editor with Palette of Components
      • Project Explorer Pane
      • Outline Pane
      • Tabs Pane
      • Execution Tab
      • Keyboard Shortcuts
    • Projects
      • Creating Data Shaper projects
      • Converting Data Shaper projects
      • Structure of Data Shaper projects
      • Versioning of server project content
      • Working with Data Shaper Server Projects
      • Project configuration
    • Graphs
      • Creating an empty graph
      • Creating a simple graph
        • Placing Components
        • Placing Components from Palette
        • Connecting Components with Edges
    • Execution
      • Successful Graph Execution
      • Run configuration
      • Connecting to a running job
      • Graph states
    • Common dialogs
      • URL file dialog
      • Edit Value dialog
      • Open Type dialog
    • Import
      • Import Data Shaper projects
      • Import from Data Shaper server sandbox
      • Import graphs
      • Import metadata
    • Export
      • Export graphs to HTML
      • Export to Data Shaper Server sandbox
      • Export image
      • Export Project as Library
    • Graph tracking
      • Changing record count font size
    • Search functionality
    • Working with Data Shaper server
      • Data Shaper server project basic principles
      • Connecting via HTTP
      • Connecting via HTTPS
      • Connecting via Proxy Server
    • Graph components
      • Adding components
      • Finding components
      • Edit component dialog box
      • Enable/disable component
      • Passing data through disabled component
      • Common properties of components
      • Specific attribute types
      • Metadata templates
    • Edges
      • Connecting Components with Edges
      • Types of Edges
      • Assigning Metadata to Edges
      • Colors of Edges
      • Debugging Edges
      • Edge Memory Allocation
    • Metadata
      • Records and Fields
        • Record Types
        • Data Types in Metadata
        • Data Formats
        • Locale and Locale Sensitivity
        • Time Zone
        • Autofilling Functions
      • Metadata Types
        • Internal Metadata
        • External (Shared) Metadata
        • SQL Query Metadata
        • Reading Metadata from Special Sources
      • Auto-propagated Metadata
        • Sources of Auto-Propagated Metadata
        • Explicitly Propagated Metadata
        • Priorities of Metadata
        • Propagation of SQL Query Metadata
      • Creating Metadata
        • Extracting Metadata from a Flat File
        • Extracting Metadata from an XLS(X) File
        • Extracting Metadata from a Database
        • Extracting Metadata from a DBase File
        • Extracting Metadata from Salesforce
        • SQL Query Metadata
        • User Defined Metadata
      • Merging Existing Metadata
      • Creating Database Table from Metadata and Database Connection
      • Metadata Editor
        • Opening Metadata Editor
        • Basics of Metadata Editor
        • Record Pane
        • Field Name vs. Label vs. Description
        • Details Pane
      • Changing and Defining Delimiters
      • Editing Metadata in the Source Code
      • Multi-value Fields
        • Lists and Maps Support in Components
        • Joining on multivalue fields (Comparison Rules)
    • Connections
      • Database Connections
        • Internal Database Connections
        • External (Shared) Database Connections
        • Database Connections Properties
        • Encryption of Access Password
        • Browsing Database and Extracting Metadata from Database Tables
        • Windows Authentication on Microsoft SQL Server
        • Snowflake Connection
        • Hive Connection
        • Troubleshooting
      • JMS Connections
      • QuickBase Connections
      • Hadoop Connections
      • Kafka Connections
      • OAuth2 Connections
      • MongoDB Connections
      • Salesforce Connections
    • Lookup Tables
      • Lookup Tables in Cluster Environment
      • Internal Lookup Tables
      • External (Shared) Lookup Tables
      • Types of Lookup Tables
    • Sequences
      • Persistent Sequences
      • Non Persistent Sequences
      • Internal Sequences
      • External (Shared) Sequences
      • Editing a Sequence
      • Sequences in Cluster Environment
    • Parameters
      • Internal Parameters
      • External (Shared) Parameters
      • Secure Graph Parameters
      • Graph Parameter Editor
      • Secure Graph Parameters
      • Parameters with CTL2 Expressions (Dynamic Parameters)
      • Environment Variables
      • Canonicalizing File Paths
      • Using Parameters
    • Internal/External Graph Elements
    • Dictionary
      • Creating a Dictionary
      • Using a Dictionary in Graphs
    • Execution Properties
    • Notes in Graphs
      • Placing Notes into Graph
      • Resizing Notes
      • Editing Notes
      • Formatted Text
      • Links from Notes
      • Folding Notes
      • Notes Properties
    • Transformations
      • Defining Transformations
      • Transform Editor
      • Common Java Interfaces
    • Data Partitioning (Parallel Running)
    • Data Partitioning in Cluster
      • High Availability
      • Scalability
      • Graph Allocation Examples
      • Example of Distributed Execution
      • Remote Edges
    • Readers
      • Common Properties of Readers
      • ComplexDataReader
      • DatabaseReader
      • DataGenerator
      • DataOneVFSReader
      • EDIFACTReader
      • FlatFileReader
      • JSONExtract
      • JSONReader
      • LDAPReader
      • MultiLevelReader
      • SpreadsheetDataReader
      • UniversalDataReader
      • X12Reader
      • XMLExtract
      • XMLReader
      • XMLXPathReader
    • Writers
      • Common Properties of Writers
      • DatabaseWriter
      • DataOneVFSWriter
      • EDIFACTWriter
      • FlatFileWriter
      • JSONWriter
      • LDAPWriter
      • SpreadsheetDataWriter
      • HIDDEN StructuredDataWriter
      • HIDDEN TableauWriter
      • Trash
      • UniversalDataWriter
      • X12Writer
      • XMLWriter
    • Transformers
      • Common Properties of Transformers
      • Aggregate
      • Concatenate
      • DataIntersection
      • DataSampler
      • Dedup
      • Denormalizer
      • ExtSort
      • FastSort
      • Filter
      • Map
      • Merge
      • MetaPivot
      • Normalizer
      • Partition
      • Pivot
      • Rollup
      • SimpleCopy
      • SimpleGather
      • SortWithinGroups
      • XSLTransformer
    • Joiners
      • Common Properties of Joiners
      • Combine
      • CrossJoin
      • DBJoin
      • ExtHashJoin
      • ExtMergeJoin
      • LookupJoin
      • RelationalJoin
    • Others
      • Common Properties of Others
      • CheckForeignKey
      • DBExecute
      • HTTPConnector
      • LookupTableReaderWriter
      • WebServiceClient
    • CTL2 - Data Shaper Transformation Language
    • Language Reference
      • Program Structure
      • Comments
      • Import
      • Data Types in CTL2
      • Literals
      • Variables
      • Dictionary in CTL2
      • Operators
      • Simple Statement and Block of Statements
      • Control Statements
      • Error Handling
      • Functions
      • Conditional Fail Expression
      • Accessing Data Records and Fields
      • Mapping
      • Parameters
      • Regular Expressions
    • CTL Debugging
      • Debug Perspective
      • Importing and Exporting Breakpoints
      • Inspecting Variables and Expressions
      • Examples
    • Functions Reference
      • Conversion Functions
      • Date Functions
      • Mathematical Functions
      • String Functions
      • Mapping Functions
      • Container Functions
      • Record Functions (Dynamic Field Access)
      • Miscellaneous Functions
      • Lookup Table Functions
      • Sequence Functions
      • Data Service HTTP Library Functions
      • Custom CTL Functions
      • CTL2 Appendix - List of National-specific Characters
      • HIDDEN Subgraph Functions
    • Tutorial
      • Creating a Transformation Graph
      • Filtering the records
      • Sorting the Records
      • Processing Speed-up with Parallelization
      • Debugging the Java Transformation
  • DATA SHAPER SERVER
    • Introduction
    • Administration
      • Monitoring
    • Using Graphs
      • Job Queue
      • Execution History
      • Job Inspector
    • Cluster
      • Sandboxes in Cluster
      • Troubleshooting
  • Install Data Shaper
    • Install Data Shaper
      • Introduction to Data Shaper installation process
      • Planning Data Shaper installation
      • Data Shaper System Requirements
      • Data Shaper Domain Master Configuration reference
      • Performing Data Shaper initial installation and master configuration
        • Creating database objects for PostgreSQL
        • Creating database objects for Oracle
        • Executing Data Shaper installer
        • Configuring additional firewall rules for Data Shaper
Powered by GitBook
On this page
  • Local Files
  • Workspace View
  • Data Shaper Server
  • Hadoop HDFS
  • Remote Files
  • Edit URL Dialog
  • Port
  • Dictionary
  • Filtering Files and Tips
  1. DATA SHAPER DESIGNER
  2. Common dialogs

URL file dialog

PreviousCommon dialogsNextEdit Value dialog

Last updated 1 month ago

The URL File Dialog is used to navigate through the file system and select input or output files.

In many components, you are asked to specify the URL of some files. These files can serve to locate the sources of data that should be read, the sources to which data should be written or the files that must be used to transform data flowing through a component and some other file URL. To specify the URL of such a file, you can use the URL File Dialog.

To access the URL File Dialog, double-click on a component, click the Filter expression row in the component editor and then the 3-point button on the right.

The URL File Dialog has several tabs on it.

Local Files

Use the Local files tab to locate files on a local file system. The combo contains local file system places and parameters. It can be used to specify both Data Shaper projects and any other local files.

Note:

Best practice is to specify the path to files with Workspace view instead of Local view. Workspace view with help of parameters provides you with better portability of your graphs.

Workspace View

Workspace view tab serves to locate files in a workspace of a local Data Shaper project.

Data Shaper Server

Data Shaper Server dialog serves to locate files of all opened Data Shaper Server projects. Available only for Data Shaper Server projects.

Hadoop HDFS

Use the Hadoop HDFS tab to locate files on Hadoop Distributed File System.

Remote Files

The Remote files tab serves to locate files on a remote computer or on the Internet. You can specify properties of connection, proxy settings, and HTTP properties.

Edit URL Dialog

Edit URL Dialog lets you specify connection to a remote server in an easy way. Choose the protocol, specify a host name, port, credentials, and path.

The dialog lets you specify the connection using the following protocols:

  • HTTP

  • HTTPS

  • FTP

  • SFTP - FTP over SSH

  • Amazon S3

  • Azure Blob Storage

  • WebDav

  • WebDav over SSL

  • Windows Share - SMB1/CIFS

  • Windows Share - SMB 2.x, SMB 3.x

Click Save to save the connection settings. Click OK to use it.

The Load button serves to load a session from the list for subsequent editing.

The Delete button serves to delete the session from the list.

HTTP(S), (S)FTP, WebDav, and SMB

If the protocol is HTTP, HTTPS, FTP, SFTP - FTP over SSH, WebDav, WebDav over SSL, Windows Share - SMB1/CIFS or Windows Share - SMB 2.x or 3.x, the dialog allows you to specify the host name, port, username, password, and path on the server. It allows you to connect anonymously, as well.

SFTP Certificate in Data Shaper

If you are reading from or writing into remote files and are connected via an SFTP protocol using a certificate-based authorization, you should do one of the following:

  • Option 1: Create an OpenSSH configuration file and specify the path to it in the Preferences (in the Designer go to Window > Preferences) as per the screenshot below. The configuration file can hold multiple configurations for different hosts.

Hint!

If you want to explicitly select a certificate for a specific location, the best way is to use the name with the highest priority, i.e. username@hostname.key. In such a case, if the connection succeeds, other keys are ignored.

The figure below shows the format of the OpenSSH private key generated by ssh-keygen.

URL Syntax for FTP Proxy

Data Shaper is able to connect to FTP proxy using the following URL syntax:

ftp://username%40proxyuser%40ftphost:password%40proxypassword@proxyhost

where:

username - Your login on the FTP server.

proxyuser - Your login on the proxy server.

ftphost - The hostname of the FTP server.

password - Your FTP password.

proxypassword - Your proxy password.

proxyhost - The hostname of the proxy server.

Amazon S3

In the case of the Amazon S3 protocol, the dialog allows you to fill in access Key, secret key, bucket, and path. For better performance, you should fill in the corresponding region.

Having the connection specified, you can choose the particular file(s).

Amazon S3 URL

It is recommended to connect to S3 via endpoint-specific S3 URL: s3://s3.eu-central-1.amazonaws.com/bucket.name/. The end-point in URL should be the end-point corresponding to the bucket.

  • The URL with a specific endpoint has a much better performance than the generic one (s3://s3.amazonaws.com/bucket.name/), but you can only access the buckets of the specific region.

  • The endpoint affects the signature version that will be used. If you connect to the generic one, the signature version may not match the endpoint being used. Therefore the signature is sent twice and you can see an error message in the error log: DEBUG [main] - Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256. (Service: null; Status Code: 400; Error Code: InvalidRequest; Request ID: 2D7C4933BD5ED2F8), S3 Extended Request ID: 9wmejqgrZ0jRpgqvw43RXUBZOzm9rnd5/wVN19kSe0dHAF/k5rxq34jvRhy8bHd5JnqBcQTBwkM= WARN [main] - Attempting to re-send the request to cloverdx.example.test.s3.eu-central-1.amazonaws.com with AWS V4 authentication. To avoid this warning in the future, please use region-specific endpoint to access buckets located in regions that require V4 signing.

When the S3 URL does not contain Secret Key + Access Key (e.g. s3://s3.eu-central-1.amazonaws.com/bucket.name/path), Data Shaper automatically searches for credentials in the following sources (in this order):

  1. Environment Variables

  • AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY Recommended since they are recognized by all the AWS SDKs and CLI except for .NET

  • AWS_ACCESS_KEY and AWS_SECRET_KEY only recognized by Java SDK

  1. Java System Properties - aws.accessKeyId and aws.secretKey

  2. Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI

  3. Credentials delivered through the Amazon EC2 container service the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable must be set and the security manager must have permission to access the variable

  4. Instance profile credentials delivered through the Amazon EC2 metadata service

Note:

These sources of credentials may be used for graph development in a local project; for example, set aws.accessKeyId and aws.secretKey Java system properties (for Data Shaper Runtime) and add them to CloverDXDesigner.ini (for File URL dialog) so that graphs work in local projects when using S3 URLs without credentials.

Azure Blob Storage

Microsoft Azure Blob Storage is a cloud object storage service, similar to Amazon S3. Data Shaper supports Azure Blob Storage since version 5.11.

There are multiple supported authentication schemes:

  • az-blob://[account]:[key]@[account].blob.core.windows.net/container/path or

  • az-blob://AccountName=[account]:AccountKey=[key]@[account].blob.core.windows.net/container/path to avoid confusion with the Client Secret authentication. Note that the key must be URL-encoded before you can use it in the URL. The Edit URL dialog encodes the key automatically. Example Plain key: XFqGQY9/FRBucrRKldxykYUp9WmnzFHR9to/w2sP9+fXoDAKoTfWvdUOAzcaS3Wnon9mIgRbPcudtlwsNPtwzQ== Encoded key: XFqGQY9%2FFRBucrRKldxykYUp9WmnzFHR9to%2Fw2sP9%2BfXoDAKoTfWvdUOAzcaS3Wnon9mIgRbPcudtlwsNPtwzQ%3D%3D

The Client Secret is in the Certificates & secrets section of your application.

Create a new secret and copy the Value, not the Secret ID.

  • az-blob://TenantId=[TenantId]:ClientId=[ClientId]:ClientSecret=[ClientSecret]@[account].blob.core.windows.net or just

  • az-blob://[TenantId]:[ClientId]:[ClientSecret]@[account].blob.core.windows.net

  1. Environment Variables Instead of putting the authentication information into the URL, you can configure the connection using the environment variables below. The URL then contains only the storage account as a part of the host name: az-blob://[account].blob.core.windows.net/container/path

    • AZURE_STORAGE_CONNECTION_STRING Example export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=[account];AccountKey=XFqGQY9/FRBucrRKldxykYUp9WmnzFHR9to/w2sP9+fXoDAKoTfWvdUOAzcaS3Wnon9mIgRbPcudtlwsNPtwzQ==;EndpointSuffix=core.windows.net"

  • Client Secret See Client Secret Authentication above.

    • AZURE_CLIENT_ID

    • AZURE_CLIENT_SECRET

    • AZURE_TENANT_ID

  • Client Certificate You can also set up certificates in the Certificates & secrets section of your application in Azure Active Directory.

    • AZURE_CLIENT_ID

    • AZURE_TENANT_ID

    • AZURE_CLIENT_CERTIFICATE_PATH

  • Username and Password

    • AZURE_CLIENT_ID

    • AZURE_USERNAME

    • AZURE_PASSWORD

  1. Anonymous If none of the above applies, Data Shaper attempts to connect anonymously. Anonymous access must be explicitly enabled on the container. Clients can then read data from the container without authorization. az-blob://[account].blob.core.windows.net/container/path

Port

Serves to specify fields and processing type for port reading or writing. Opens only in components that allow such data source or target.

Dictionary

Dictionary tab serves to specify dictionary key value and processing type for dictionary reading or writing. Opens only in components that allow such data source or target.

Filtering Files and Tips

If you use File URL Dialog configured to display only some files according to the extension, you can see the File Extension below File URL.

Warning!

To ensure graph portability, forward slashes are used for defining the path in URLs (even on Microsoft Windows).

Note: The New Directory action is available at the toolbar of Workspace View and the Local Files tab. F7 key can be used as a shortcut for the action. Newly created directory is selected at the dialog and its name can be edited in-line. Press F2 to rename the directory and DEL to delete it.

More detailed information of URLs for each of the tabs described above is provided in sections:

You need a working to choose the particular files.

You can type the URL directly in the format described in or , or you can specify it with a help of Edit URL Dialog. The Edit URL Dialog is accessible under the icon .

Option 2: Create a directory named ssh-keys in your project, and put the private key files into this directory and choose a suitable filename with the .key suffix. Listed in order from the highest to lowest priority when resolving, the private key file can have the following names: a. b. hostname.key c. *.key (the files are resolved in alphabetical order).

For list of regions and endpoints, see .

For detailed information, see the .

Storage Shared Key This authentication is the easiest to set up. It is similar to username/password authentication. You use the name of the storage account as the username and the Access Key as the password. The disadvantage is that all applications that use the Access Key have the same permissions. You can find the key here: Azure Portal - Storage accounts - - Access keys

Client Secret This authentication scheme allows fine-grained access control, because you can set different permissions for each application that uses your storage. First, create an "application" for your Data Shaper processing in your Azure Active Directory: Azure Portal - Azure Active Directory - App registrations The authentication scheme uses three values: Tenant ID, Client ID (also called Application ID) and Client Secret. You can find the Tenant ID and Client ID in the Overview of your application.

Connection String You can find the connection string next to your Access Key: Azure Portal - Storage accounts - - Access keys

Managed Identity If the application is deployed to an Azure host with enabled, Data Shaper will authenticate with that account. az-blob://[account].blob.core.windows.net/container/path

See also: or

See also:

Hadoop Connection
username@hostname.key
AWS Regions and Endpoints (Amazon S3)
Walkthrough: Using IAM roles for EC2 instances
https://docs.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key
https://docs.microsoft.com/en-us/azure/container-registry/container-registry-authentication#service-principal
https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string
Managed Identity
Using a Dictionary in Graphs
Supported File URL Formats for Writers
Supported File URL Formats for Writers
Output Port Writing
Supported File URL Formats for Readers
Input Port Reading
Supported File URL Formats for Readers