# Running the Apache Beam samples

### Prerequisites

The steps on this page and the detail pages for Spark, Flink and Google Dataflow will set up everything you need to run the pipelines in the Hop samples project.

#### Java

You’ll already have Java installed to run Apache Hop. Both Apache Hop and Beam require a Java 17 environment.

Double-check your java version with the `java -version` command. Your output should look similar to the one below.

```highlight
openjdk version "17.0.10" 2024-01-16
OpenJDK Runtime Environment Temurin-17.0.10+7 (build 17.0.10+7)
OpenJDK 64-Bit Server VM Temurin-17.0.10+7 (build 17.0.10+7, mixed mode)
```

#### the samples project

The Hop samples project comes with a number of sample pipelines for Apache Beam. Your default Hop installation comes with the samples project by default. If your Hop installation doesn’t come with this project, create a new project and point its Home folder to `<HOP>/config/projects/samples`.

The Samples project contains the following pipeline run configurations

* local: the native local run configuration. We’ll be ignoring this run configuration in the context of this guide.
* Dataflow: the Apache Beam run configuration for Google Cloud Dataflow.
* Direct: the direct runner Apache Beam run configuration. The [Direct Runner](https://beam.apache.org/documentation/runners/direct/) executes pipelines on your machine and is designed to validate that pipelines adhere to the Apache Beam model as closely as possible. Instead of focusing on efficient pipeline execution, the Direct Runner performs additional checks to ensure that users do not rely on semantics that are not guaranteed by the model.
* Flink: the Apache Beam run configuration for Apache Flink.
* Spark:the Apache Beam run configuration for Apache Spark.

#### Build your Hop Fat Jar

Apache Beam requires a so-called `fat jar` that bundles all required Java classes and their dependencies into a single jar file.

Build this jar for your Hop installation through `Tools → Generate a Hop fat jar`.

Save this file in a convenient location and file name. Either store this outside of your project folder or add it to your `.gitignore`. You don’t want to accidentally add hundreds of MB to your git repository.

#### Flink and Spark: export your project metadata

You’ll need to pass your project’s metadata to JSON to pass it to either `spark-submit` or `flink run`.

Export your project metadata through `Tools → Export metadata to JSON`.

Save this file in a convenient location and file name. Either store this outside of your project folder or add it to your `.gitignore`. Your project’s metadata folder should already be in version control , you don’t want to add this full metadata point-in-time export once again.

### Running the samples for Direct runner, Flink and Spark

* [Direct Runner](/data-shaper-1.21/knowing-the-data-shaper-designer/pipelines/getting-started-with-apache-beam/running-the-beam-samples/beam-samples-direct-runner.md)
* [Apache Flink](/data-shaper-1.21/knowing-the-data-shaper-designer/pipelines/getting-started-with-apache-beam/running-the-beam-samples/beam-samples-flink.md)
* [Apache Spark](/data-shaper-1.21/knowing-the-data-shaper-designer/pipelines/getting-started-with-apache-beam/running-the-beam-samples/beam-samples-spark.md)
* [Google Cloud Dataflow](/data-shaper-1.21/knowing-the-data-shaper-designer/pipelines/getting-started-with-apache-beam/running-the-beam-samples/beam-samples-dataflow.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.primeur.com/data-shaper-1.21/knowing-the-data-shaper-designer/pipelines/getting-started-with-apache-beam/running-the-beam-samples.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.