Data Shaper and Apache Hop

Data Shaper is built on Apache Hop, the open-source data orchestration and transformation framework. Within the Data One suite, this foundation enables a consistent and unified environment for designing, executing, and monitoring data transformations. Data Shaper is fully integrated with other Data One modules: transformations can be invoked in Data Mover mediation contracts and monitored as part of Data Watcher dataflows.

Hop Visual Design and Metadata

Apache Hop, short for Hop Orchestration Platform, is a data orchestration and data engineering platform that aims to facilitate all aspects of data and metadata orchestration. Hop lets you focus on the problem you’re trying to solve without technology getting in the way. Simple tasks should be easy, complex tasks need to be possible.

Hop allows data professionals to work visually, using metadata to describe how data should be processed. Visual design enables data developers to focus on what they want to do instead of how that task needs to be done. This focus on the task at hand lets Hop developers be more productive than they would be when writing code.

Hop Flexible Runtimes

Hop developers create workflows and pipelines in a visual development environment called Hop Gui. These workflows and pipelines can be executed on a variety of engines: workflows and pipelines can run on the native Hop engine, both locally and remotely. Pipelines can also run on Apache Spark, Apache Flink and Google Dataflow through the Apache Beam runtime configurations.

In workflows and pipelines, hundreds of operations can be applied on the data: read from and write to a variety of source and target platforms, but also combine, enrich, clean and in many other ways manipulate data. Depending on the engine and selected functionality, your data can be processed in batch, streaming or in a batch/streaming hybrid.

Hop Core Concepts

Before we dive deeper, let’s take a minute to familiarize ourselves with the Hop lingo.

Metadata is by far the most important concept in all of Hop. Every item we’ll cover below is defined as metadata. All interactions between Hop and other components in your data architecture are done through metadata. Metadata is at the core of everything in Hop.

  • Pipelines are collections of transforms, connected by hops. All transforms in a pipeline run in parallel.

  • Workflows are collections of actions, connected by hops. All actions in a workflow run sequentially by default.

  • Projects are logical collections of hop code and configuration. Environments contain the environment-specific (e.g. dev, uat, prd) metadata.

Last updated