XMLXPathReader
Last updated
Last updated
XMLXPathReader reads data from XML files.
Which XML Component?
Generally, use . It is fast and has GUI to map elements to records. It is based on SAX. can use more complex XPath expressions than XMLExtract, e.g. it allows you to reference siblings. On the other hand, this XMLReader is slower and needs more memory than XMLExtract. XMLReader is based on DOM. XMLReader supersedes the original XMLXPathReader. XMLXPathReader can use more complex XPath expressions than XMLExtract. XMLXPathReader uses DOM.
XMLXPathReader
XML file
0-1
1-n
x
✓
x
x
x
x
x
The component sends different data records to different output ports using return values of the transformation. For more information, see Return Values of Transformations. XMLExtract and XMLXPathReader send data to ports as defined in their Mapping or Mapping URL attribute.
Input
0
x
One field (byte, cbyte, string
).
Output
0
✓
For correct data records.
Any [1]
1-n
[2]
Error port
Any [1] (each port can have different metadata)
[1] Metadata on each output port does not need to be the same. Metadata can use . Note: source_timestamp
and source_size
functions work only when reading from a file directly (if the file is an archive or it is stored in a remote location, timestamp will be empty and size will be 0).
[2] Other output ports are required if mapping requires that.
Metadata on each output port does not need to be the same.
Metadata can use Autofilling Functions. Note: source_timestamp
and source_size
functions work only when reading from a file directly (if the file is an archive or it is stored in a remote location, timestamp will be empty and size will be 0).
Basic
File URL
Yes
Charset
Encoding of records that are read. The default encoding depends on DEFAULT_CHARSET_DECODER in defaultProperties.
UTF-8 |
Data Policy
Strict (default) | Controlled [1] | Lenient
Mapping
[2]
An external text file containing the mapping definition. For more information, see XMLXPathReader Mapping Definition below.
Mapping URL
[2]
Mapping the input XML structure to output ports. For more information, see XMLXPathReader Mapping Definition below.
Implicit mapping
If true, map element values to the fields having a same name in record. Example: An element (salary) is automatically mapped onto field of the same name (salary).
If true, map element values to the fields having a same name in record. Example: An element (salary) is automatically mapped onto field of the same name (salary).
Advanced
XML features
Number of skipped mappings
0-N
Max number of mappings
0-N
[1] Controlled data policy in XMLXPathReader does not send error records to edge. Records are written to the log. [2] One of these has to be specified. If both are specified, Mapping URL has higher priority.
Example 23. Mapping in XMLXPathReader
Every Mapping definition (both the contents of the file specified in the Mapping URL attribute and the Mapping attribute) consists of <Context>
tags which contain also some attributes and allow mapping of element names to Data Shaper fields.
Each <Context>
tag can surround a series of nested <Mapping>
tags. These allow to rename XML elements to Data Shaper fields.
Each of these <Context>
and <Mapping>
tags contains some XMLXPathReader Context Tag Attributes and XMLXPathReader Mapping Tag Attributes, respectively.
XMLXPathReader Context Tags and Mapping Tags
Empty Context Tag (Without a Child)
<Context xpath="xpathexpression"XMLXPathReader Context Tag Attributes />
Non-Empty Context Tag (Parent with a Child)
<Context xpath="xpathexpression"XMLXPathReader Context Tag Attributes>
(nested Context and Mapping elements (only children, parents with one or more children, etc.)
</Context>
Empty Mapping Tag (Renaming Tag)
- xpath
is used:
<Mapping xpath="xpathexpression"XMLXPathReader Mapping Tag Attributes/>
- nodeName
is used:
<Mapping nodeName="elementname"XMLXPathReader Mapping Tag Attributes/>
XMLXPathReader Context Tag and Mapping Tag Attributes a) XMLXPathReader Context Tag Attributes
xpath
Required
The xpath expression can be any XPath query.
Example: xpath="/tagA/…​/tagJ"
outPort
Optional
The number of an output port to which data is sent. If not defined, no data from this level of Mapping is sent out using such level of Mapping.
Example: outPort="2"
parentKey
Both parentKey
and generatedKey
must be specified.
The sequence of metadata fields on the next parent level separated by a semicolon, colon, or pipe. Number and data types of all these fields must be the same in the generatedKey
attribute or all values are concatenated to create a unique string value. In such a case, the key has only one field.
Example: parentKey="first_name;last_name"
Equal values of these attributes assure that such records can be joined in the future.
generatedKey
Both parentKey
and generatedKey
must be specified.
The sequence of metadata fields on the specified level separated by a semicolon, colon, or pipe. Number and data types of all these fields must be the same in the parentKey
attribute or all values are concatenated to create a unique string value. In such a case, the key has only one field.
Example: generatedKey="f_name;l_name"
Equal values of these attributes assure that such records can be joined in the future.
sequenceId
When a pair of parentKey
and generatedKey
does not insure a unique identification of records, a sequence can be defined and used.
Id of the sequence.
Example: sequenceId="Sequence0"
sequenceField
When a pair of parentKey
and generatedKey
does not insure a unique identification of records, a sequence can be defined and used.
A metadata field on the specified level in which the sequence values are written. Can serve as parentKey
for the next nested level.
Example: sequenceField="sequenceKey"
b) XMLXPathReader Mapping Tag Attributes
xpath
Either xpath
or nodeName
must be specified in the <Mapping>
tag.
XPath query.
Example: xpath="tagA/…​/salary"
nodeName
Either xpath
or nodeName
must be specified in the <Mapping>
tag. Using nodeName
is faster than using xpath
.
XML node that should be mapped to Clover field.
Example: nodeName="salary"
cloverField
Required
A Clover field to which XML node should be mapped.
The name of the field in the corresponding level.
Example: cloverField="SALARY"
trim
Optional
Specifies whether leading and trailing white spaces should be removed. By default, it removes both leading and trailing white spaces.
Example: trim="false"
(white spaces will not be removed)
We recommend users to explicitly specify Charset.
For port reading. See Reading from Input Port in .
Specifies which data source(s) will be read (XML file, input port, dictionary). See .
Determines what should be done when an error occurs. For more information, see .
A sequence of individual true/false
expressions related to XML features which should be validated. The expressions are separated from each other by a semicolon. For more information, see .
The number of mappings to be skipped continuously throughout all source files. See .
The maximum number of records to be read continuously throughout all source files. See .
XMLXPathReader reads data from XML files (using the DOM
parser). It can also read data from compressed files, input port, and dictionary.
This component is slower and needs more memory than , which can read XML files too. supersedes the XMLXPathReader.
namespacePaths
Optional
Default namespaces that should be used for the xpath
attribute specified in the <Context>
tag.
Pattern: namespacePaths='prefix1="URI1";…​;prefixN="URIN"'
Example: namespacePaths='n1="http://
.
Note: remember that if the input XML file contains a default namespace, this namespacePaths
must be specified in the corresponding place of the Mapping attribute. In addition, namespacePaths
is inherited from the <Context>
element and used by the <Mapping>
elements.
namespacePaths
Optional
Default namespaces that should be used for the xpath
attribute specified in the <Mapping>
tag.
Pattern:¨namespacePaths='prefix1="URI1";…​;prefixN="URIN"'
Example: namespacePaths='n1="http://
Note: remember that if the input XML file contains a default namespace, this namespacePaths
must be specified in the corresponding place of the Mapping attribute. In addition, namespacePaths
is inherited from the <Context> element and used by the <Mapping> elements.
The component XMLXPathReader does not support reading of multivalue fields. See see . If you need to read multivalue fields from XML, use or .