Running the Apache Beam samples With Apache Flink
Get Flink
Download your selected Flink version and unzip to a convenient location.
Start your local Flink single node cluster
To keep things as simple as possible, we’ll run a local single node Flink cluster with a single command.
In the folder where you unzipped Flink to, run:
bin/start-cluster.sh
Your output should look similar to the one below:
Starting cluster.
Starting standalonesession daemon on host <HOSTNAME>.
Starting taskexecutor daemon on host <HOSTNAME>.The cluster shouldn’t take more than a couple of seconds to start. Once Flink is available, you’ll be able to access your Flink Dashboard at http://localhost:8081/
Flink Run Configuration setup
In Hop Gui’s metadata perspective for the Samples project, edit the Flink pipeline run configuration and make sure the Fat jar file location (the very last option) points to the Hop fat jar you created earlier in the prerequisites.
From Hop GUI
Set your Flink master to your cluster’s master. For embedded Flink, [local] will do.
Go back to the data orchestration perspective and run one of the Beam pipelines in the samples project. In this example, we used samples/beam/pipelines/generate-synthetic-data.hpl
When you start your pipeline from Hop Gui, it will appear in your Flink Dashboard.
From Flink Run
In a real-world setup, you’ll run your Flink pipelines from the Flink master through flink run.
Set your Flink master to [auto] and export your Hop metadata again (see prerequisites).
Unlike Spark you can not pass java options at runtime to the TaskManager. So we also want to set the PROJECT_HOME variable in the run configuration. This variable is used during execution to know where the source files are. (Metadata perspective → Pipeline Run Configuration → Flink → Variables) Alternatively, you can provide a 4th argument after the run configuration name: the name of the environment configuration file to use.
Use a command like the one below to pass all the information required by flink run.
With your Hop and Flink set up correctly, your output will look similar to what’s shown below:
After your pipeline finishes and the flink run command ends, your Flink dashboard will show a new entry in the 'Completed Job List'. You can follow up any running applications in the 'Running Job List' and drill down into their execution details while running.
Last updated