Metadata Types
Metadata Types
Metadata is one of the cornerstones in Hop and can be defined as workflows, pipelines and any other type of metadata objects.
Hop Gui has a Metadata Perspective to manage all types of metadata: run configurations, database (relational and NoSQL) connections, logging, and pipeline probes just to name a few.
Metadata is typically stored as json files in a projects' metadata folder as a set of json files, in subfolders per metadata type. The only exception to the rule are workflows and pipelines, which are defined as XML (for now, because of historical reasons). Since workflows and pipelines are what Hop is all about, these are typically stored in your project folder, not in your project’s metadata folder.
By default, Hop contains the following metadata types:
Asynchronous Web Service: Execute and query a workflow asynchronously through a web service.
Azure Blob Storage Authentication: A Azure Blob Storage connection type.
Beam File Definition: Describes a file layout in a Beam Pipeline
Cassandra Connection: Describes a connection to a Cassandra cluster
Data Set: This defines a data set, a static pre-defined collection of rows
Execution Data Profile: Collects and profiles data as it flows through a pipeline using configurable samplers for insight into value ranges, nulls, and row samples.
Execution Information Location: Defines where and how Apache Hop stores execution metadata, supporting local files, remote servers, Neo4j, or Elastic for later inspection and analysis.
Google Storage Authentication: A Google Cloud Storage connection type.
Hop Server: Defines a Hop Server
MongoDB Connection: Describes a MongoDB connection
Mail Server Connection: Describes a mail server connection
Neo4j Connection: A shared connection to a Neo4j server
Neo4j Graph Model: Description of the nodes, relationships, indexes, … of a Neo4j graph
Partition Schema: Describes a partition schema
Pipeline Log: Allows to log the activity of a pipeline with another pipeline
Pipeline Probe: Allows to stream output rows of a pipeline to another pipeline
Pipeline Run Configuration: Describes how and with which engine a pipeline is to be executed
Pipeline Unit Test: Describes a test for a pipeline with alternative data sets as input from a certain transform and testing output against golden data
Relational Database Connection: Describes all the metadata needed to connect to a relational database
REST Connection: Describes all the metadata needed to connect to a REST api.
Splunk Connection: Describes a Splunk connection
Static Schema Definition: Defines a reusable data stream layout to ensure consistency across multiple pipelines and simplify schema management.
Variable Resolver: Use plugins to resolve variable values with a pipeline, a key store, a vaults, or secret managers.
Web Service: Allows to run a pipeline to generate output for a servlet on Hop Server
Workflow Log: Allows to log the activity of a workflow with a pipeline
Workflow Run Configuration: Describes how to run a workflow
Last updated