XMLReader
Last updated
Last updated
XMLReader reads data from XML files using DOM technology. It can also read data from compressed files, input port, and dictionary.
Which XML Component?
Generally, use . It is fast and has GUI to map elements to records. It is based on SAX. XMLReader can use more complex XPath expressions than XMLExtract, e.g. it allows you to reference siblings. On the other hand, this XMLReader is slower and needs more memory than XMLExtract. XMLReader is based on DOM. XMLReader supersedes the original . XMLXPathReader can use more complex XPath expressions than XMLExtract. XMLXPathReader uses DOM.
XMLReader
XML file
0-1
1-n
x
โ
x
x
x
x
x
XMLReader, XMLExtract and XMLXPathReader send data to ports as defined in their Mapping or Mapping URL attribute.
Input
0
x
One field (byte, cbyte, string
).
Output
0 ... n-1
โ
For correct data records. Connect more than one output ports if your mapping requires that.
Any
1-n
x
Error port
Restricted format. See Metadata here below.
XMLReader does not propagate metadata.
XMLReader has metadata templates on the error port. There are two templates: XMLReader_TreeReader_ErrPortWithoutFile and XMLReader_TreeReader_ErrPortWithFile.
0
port
integer
The number of the output port where errors occurred.
1
recordNumber
integer
Record number (per source and port).
2
fieldNumber
integer
Field number
3
fieldName
string
Field name
4
value
string
The value which caused the error
5
message
string
Error message
6
file
string
Source name. This field is optional
ATTRIBUTE
REQ.
DESCRIPTION
POSSIBLE VALUES
Basic
File URL
Yes
Charset
Encoding of records that are read. When reading from files, the charset is detected automatically (unless you specify it yourself). Important: if you are reading from a port or dictionary, always set Charset explicitly (otherwise errors will occur). There is no autodetection as in reading from files.
ISO-8859-1 (default) |
Data Policy
Strict (default) | Controlled | Lenient
Mapping
[1]
The mapping of the input XML structure to output ports. For more information, see Defining the Mapping below.
Mapping URL
[1]
An external text file containing the mapping definition. For more information, see Defining the Mapping below.
Implicit mapping
If true, map element values to the fields having a same name in record. Example: An element (salary
) is automatically mapped onto field of the same name (salary
).
If true, map element values to the fields having a same name in record. Example: An element (salary) is automatically mapped onto field of the same name (salary).
Advanced
XML features
[1] One of these must be specified. If both are specified, Mapping URL has higher priority.
Records and fields to be send out to the output ports are specified using XML elements and attributes. Each Context
element corresponds to one output port attached. Each Mapping
element defines a mapping to one field. See the example below.
Example 21. Mapping in XMLReader
The nested structure of <Context>
tags is similar to the nested structure of XML elements in input XML files.
However, the Mapping attribute does not need to copy whole XML structure, it can start at the specified level inside the whole XML file.
The Mapping definition is specified in the Mapping URL attribute or in the Mapping attribute.
Every Mapping definition consists of <Context>
tags. Each <Context>
tag defines a mapping of particular XML subtree to record being sent to the specified output port.
Each <Context>
tag can surround a series of nested <Mapping>
tags. These allow to map XML elements or attributes to Data Shaper fields.
Each of these <Context>
and <Mapping>
tags contains some Context Tag Attributes and Mapping Tag Attributes, respectively.
Empty Context Tag (Without a Child)
<Context xpath="xpathexpression" />
See Context Tag Attributes.
Non-Empty Context Tag (Parent with a Child)
<Context xpath="xpathexpression">
(nested Context and Mapping elements (only children, parents with one or more children, etc.)
</Context>
See Context Tag Attributes.
Empty Mapping Tag (Renaming Tag)
- xpath
is used:
<Mapping xpath="xpathexpression" />
- nodeName
is used:
<Mapping nodeName="elementname" />
See Mapping Tag Attributes.
xpath
Required
The xpath expression can be any XPath query.
Example: xpath="/tagA/โฆโ/tagJ"
outPort
Optional
The number of an output port to which data is sent. If not defined, no data from this level of Mapping is sent out using such level of Mapping.
Example: outPort="2"
parentKey
Both parentKey
and generatedKey
must be specified.
The sequence of metadata fields on the next parent level separated by a semicolon, colon, or pipe. Number and data types of all these fields must be the same in the generatedKey
attribute or all values are concatenated to create a unique string value. In such a case, the key has only one field.
Example: parentKey="first_name;last_name"
Equal values of these attributes assure that such records can be joined in the future.
generatedKey
Both parentKey
and generatedKey
must be specified.
The sequence of metadata fields on the specified level separated by a semicolon, colon, or pipe. Number and data types of all these fields must be the same in the parentKey
attribute or all values are concatenated to create a unique string value. In such a case, the key has only one field.
Example: generatedKey="f_name;l_name"
Equal values of these attributes assure that such records can be joined in the future.
sequenceId
When a pair of parentKey
and generatedKey
does not insure a unique identification of records, a sequence can be defined and used.
Id of the sequence.
Example: sequenceId="Sequence0"
sequenceField
When a pair of parentKey
and generatedKey
does not insure a unique identification of records, a sequence can be defined and used.
A metadata field on the specified level in which the sequence values are written. Can serve as parentKey
for the next nested level.
Example: sequenceField="sequenceKey"
xpath
Either xpath
or nodeName
must be specified in the <Mapping>
tag.
XPath query.
Example: xpath="tagA/โฆโ/salary"
nodeName
Either xpath
or nodeName
must be specified in the <Mapping>
tag. Using nodeName
is faster than using xpath
.
XML node that should be mapped to Clover field.
Example: nodeName="salary"
cloverField
Required
A Clover field to which XML node should be mapped.
The name of the field in the corresponding level.
Example: cloverField="SALARY"
trim
Optional
Specifies whether leading and trailing white spaces should be removed. By default, it removes both leading and trailing white spaces.
Example: trim="false"
(white spaces will not be removed)
cloverField
Required
Output Clover field to input should be mapped.
Example: cloverField="SALARY"
inputField
Required
Input field to be used.
Example: inputField="SALARY"
Reading Multivalue Fields
Note that reading maps is handled as reading pure string
(for all data types as mapโs values).
Example 22. Reading lists with XMLReader An example input file containing these elements (just a code snippet):
can be read back by the component with this mapping:
where attendanceList is a field of your metadata. The metadata has to be assigned to the componentโs output edge. After you run the graph, the field gets populated by XML data like this (this will be seen in View data):
[John,Vicky,Brian]
If you use input port reading in discrete
or source
mode, you can map particular input fields to output fields using the inputField
attribute.
This example shows the basic usage of XMLReader.
You have a retail.xml
file with data about your retail sale.
Create a list containing order_id, customer first name, surname and email(s).
Solution Create a metadata having 4 fields: order_id (integer), name (string), surname (string), email (string[]). Set up the attributes File URL, Implicit mapping and Mapping.
File URL
${DATAIN_DIR}/retail.xml
Mapping
See the xml below
Implicit mapping
true
If you set Implicit mapping to true, fields name and surname are populated by values of corresponding elements. Content of the Mapping attribute:
The XMLReader will send following 2 records to its first output port.
This example shows reading an input file while some input fields are mapped to an output. Given a list of customers and paths to the files with orders.
Each file can contain one or more products:
Create a list with customers and products:
Solution Use the File URL, Charset and Mapping attributes.
File URL
port:$0.filename:source
Charset
UTF-8
Mapping
See the code below
This example shows reading of an input file with nested elements. The nested elements on different levels are sent out to the different output ports.
The input file countries-and-counties.xml
contains a list of countries. Each country has a name and contains several counties. Each county has a name.
Make a list of countries, and a list of counties with corresponding countries.
Solution Assign metadata country with the field countryName to the edge on the first output port. Assign metadata county with the fields countryName and countyName to the edge on the second output port. Use the File URL, Charset and Mapping attributes.
File URL
${DATAIN_DIR}/countries-and-counties.xml
Charset
UTF-8
Mapping
See the code below
The records sent to the first output port are:
The records sent to the second output port are:
This example shows you how to read XML that contains different namespaces.
A web page contains SVG graphics and links to other web pages. The links (<a>
) are of two namespaces: xhtml
and svg
. Get URLs of the links from SVG image.
Solution Use the File URL, Charset and Mapping attributes.
File URL
${DATAIN_DIR}/page.xhtml
Charset
UTF-8
Mapping
See the code below
The <Context> element should be used only if you intend to send record corresponding to subtree to the output. Use
instead of
We recommend users to explicitly specify Charset.
For port reading. See Reading from Input Port in .
Input metadata has one field with datatype byte, cbyte
or string
.
The metadata on each of the output ports does not need to be the same. Each of these metadata can use .
If you intend to use the last output port for error logging, metadata must have a fixed format. Field names can be arbitrary, field types must be same as from the template.
Specifies which data source(s) will be read (XML file, input port, dictionary). See .
Determines what should be done when an error occurs. For more information, see .
A sequence of individual true/false
expressions related to XML features which should be validated. The expressions are separated from each other by a semicolon. For more information, see .
namespacePaths
Optional
Default namespaces that should be used for the xpath
attribute specified in the <Context>
tag.
Pattern: namespacePaths='prefix1="URI1";โฆโ;prefixN="URIN"'
Example: namespacePaths='n1="http://
.
namespacePaths
Optional
Default namespaces that should be used for the xpath
attribute specified in the <Mapping>
tag.
Pattern:ยจnamespacePaths='prefix1="URI1";โฆโ;prefixN="URIN"'
Example: namespacePaths='n1="http://
You can read only lists, however (see ).
The output contains URL: http://.
To avoid typing lines like:
<Mapping xpath="salary" cloverField="salary"/>
Switch on the implicit mapping (see ) and use explicit mapping only to populate fields with data from distinct elements.