JSON Input
JSON Input
Description
The JSON input transform reads data from JSON structures, files, or incoming fields using a JSONPath expression to extract data and output rows.
Tips
JSONPath expressions can use either dot notation or square bracket notation.
You can pass a REST request result field onwards without selecting any fields on the Fields tab if needed.
Examples
For example, connect a REST client to a JSON input transform to read JSON into data rows.
To read nested JSON elements, use a sequence of JSON Input transforms, where the first JSON Input transform reads the nested element as a String field. In the second JSON Input transform, use the
Source is from a previous transformoption to select the String field you used in the first JSON Input transform to read the information from the nested element.
Check the json-input-nested-elements.hpl sample in the Samples project for an example.
Samples (Samples project):
json-input-basic.hpl
json-input-nested-elements.hpl
Hop Engine
✓
Spark
✓
Flink
✓
Dataflow
✓
Options
File Tab
Source is from a previous transform
Select to retrieve the source from a previously defined field. When selected, the following fields are available:
Select field
Use field as file names
Read source as URL
Do not pass field downstream
When this option is cleared, the following fields are available:
File or directory
Regular Expression
Exclude Regular Expression
Selected files
Select field
Specify the field name to use as a source from a previous transform.
Use field as file names
Select to indicate the source is a filename.
Read source as URL
Select to indicate if the source should be accessed as a URL.
Do not pass field downstream
Select to remove the source field from the output stream. This action improves performance and memory utilization with large JSON fields.
File or directory
Specify the source location if the source is not defined in a field.
Click Browse to navigate to your source file or directory.
Click Add to include the source in the Selected files table.
Regular expression
Specify a regular expression to match filenames within a specified directory.
Exclude regular expression
Specify a regular expression to exclude filenames within a specified directory.
File/Directory
The source location indicated by clicking Add after specifying it in File or directory.
Wildcard (RegExp)
Wildcards as specified in Regular expression.
Exclude wildcard
Excluded wildcards as specified in Exclude regular expression.
Required
Required source location for input.
Include subfolders
Whether subfolders are included within the source location.
Delete
Remove a source from the table
Edit
Remove a source from the table and return it back to the File or directory option.
Show filename(s)
Display the file names of the sources successfully connected to the JSON Input transform.
Content Tab
The Content tab contains the following options for configuring which data to retrieve:
Ignore empty file
Select to skip empty files. When cleared, empty files will cause the process to fail and stop.
Do not raise an error if no files
Select to continue when no files are available to process.
Ignore missing path
Select to continue processing files when an error occurs that (1) no fields match the JSON path or (2) that all the values are null. When cleared, no further rows are processed when an error occurs.
Default path leaf to null
Select to return a null value for missing paths.
Limit
Specify a limit on the number of records generated from the tra. Results are not limited when set to zero.
Include filename in output
Select to add a string field with the filename in the result.
Rownum in output
Select to add an integer field with the row number in the result.
Add filenames to result
Select to add processed files to the result file list.
Fields Tab
The Fields tab displays field definitions to extract values from the JSON structure. The table in this tab contain the following columns:
Name
Name of field that maps to the corresponding field in the JSON input stream.
Path
Complete path of the field name in the JSON input stream. Hop uses the library JayWay for JSON Path expressions and documentation is at https://github.com/json-path/JsonPath. All records can be retrieved by adding the asterisk * in the path. For example, $.mydata.*
Type
Data type of the input field.
Format
An optional mask for converting the format of the original field. See Common Formats for information on common valid date and numeric formats you can use in this transform.
Length
Length of the field.
Precision
Number of floating point digits for number-type fields.
Currency
Currency symbol ($ or €, for example).
Decimal
A decimal point can be a . (5,000.00 for example) or , (5.000,00 for example).
Group
A grouping can be a , (10,000.00 for example) or . (5.000,00 for example).
Trim type
The trim method to apply to a string.
Repeat
The corresponding value from the last row repeated if a row is empty.
Select fields
Click Select Fields button in the Fields tab to open the Select Fields window. Select the checkbox next to each field in your source file that you want include in your output. All the fields selected in this transform are added to the table. You can search for a field name by entering the field name in the Search box.
Select fields from snippet
Click Select fields from snippet button and paste the entire JSON text and it will populate the Name, Path, Type columns in the Fields tab. Example: This is useful when copying JSON from a previous REST client’s output result.
Dates and Timestamps
For Hop types of Date and Timestamp, use the Format column to specify the incoming format. E.g. for Date: yyyy-MM-dd. You can either select a format from the dropdown or enter format text directly. Hop uses SimpleDateFormat (Java Platform SE 8) at https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html
Examples of Timestamp formats:
Format: yyyy-MM-dd’T’HH:mm:ss.SSSZ and Result: 2021-10-26T20:51:43.795+0000
Format: 2024-04-22T00:00:00.000Z and Result: 2024-04-22T00:00:00.000Z
Regex
You can use Regex in the Path expression.
Example Path to filter on those that start with the letter 'a': $.data[?(@=~/a.*/i)]
Additional output fields tab
The Additional output fields tab contains the following options to specify additional information about the file to process:
Short filename field
Specify the field that contains the filename without path information but with an extension.
Extension field
Specify the field that contains the extension of the filename.
Path field
Specify the field that contains the path in operating system format.
Size field
Specify the field that contains the size of the data.
Is hidden field
Specify the field indicating if the file is hidden or not (Boolean).
Last modification field
Specify the field indicating the date of the last time the file was modified.
Uri field
Specify the field that contains the URI.
Root uri field
Specify the field that contains only the root part of the URI.
Considerations
While processing input JSON files, if a JSON record has one or more field containing null values, by default the null values will be present in the transform output.
For example if we have a JSON file like this
{
"persons" : [
{
"id": "1",
"name": "name 1"
},
{
"id": "2",
"name": "name 2"
},
{
"id": "3",
"name": null
},
{
"id": "4",
"name": "name 4"
}
]
}When extracting the fields id and Name using the following field definition:
id
$.persons.*.id
Name
$.persons.*.name
given the default behavior, the output will be
id;Name
1;Name 1
2;Name 2
3;null
4;Name 4Now let’s only select the name field and see what happens
Name
Name 1
Name 2
Name 4You will notice that you only have 3 rowsets returned in this case ( the null line is omited from the result)
To change Hop’s behavior regarding null values in JSON files, so that null values will not be considered in JSON output, you change the HOP_JSON_INCLUDE_NULLS configuration variable and set it’s value to N
HOP_JSON_INPUT_INCLUDE_NULLS = NAfter restarting Hop, when we run the pipeline once again you will have 3 rows resulting because the the null values will be omitted.
Last updated