# Google Dataflow Pipeline (Template)

## Google Dataflow Pipeline (Template)

Apache Hop pipelines can be scheduled and triggered in various ways. In this section we will walk through the steps needed to schedule a pipeline on Google Dataflow using [Dataflow Templates](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates). Apache Hop uses a [flex template](https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates) to launch a job on Google Dataflow.

### Preparing your environment

Before we can add a new pipeline in the Google Cloud Platform [console](https://console.cloud.google.com/dataflow/pipelines) we need to create a Google Storage bucket that contains 3 types of files.

#### Hop pipelines

The pipelines you created using the Hop Gui and wish to schedule in Google Dataflow.

{% hint style="info" %}
Tip: You can also create a Hop project using a Google Storage bucket this way you can directly create and edit Hop pipelines in GS.
{% endhint %}

#### Hop Metadata

For the pipeline to be able to use Hop metadata objects and other run configurations we need to generate a hop metadata.json file. This file can be generated from the GUI under Tools → Export metadata to JSON or using the export-metadata function from the [Hop conf](/data-shaper-1.21/index-2/snippets/hop-tools/hop-conf.md) tool.

#### Beam Flex template metadata file:

The final part to get everything working is a metadata file used by Dataflow to stitch all the parts together.

```highlight
{
    "defaultEnvironment": {},
    "image": "apache/hop-dataflow-template:latest",
    "metadata": {
        "description": "This template allows you to start Hop pipelines on dataflow",
        "name": "Template to start a hop pipeline",
        "parameters": [
            {
                "helpText": "Google storage location pointing to the Hop metadata file",
                "label": "Hop Metadata Location",
                "name": "hopMetadataLocation",
                "regexes": [
                    ".*"
                ]
            },
            {
                "helpText": "Google storage location pointing to the pipeline you wish to start",
                "label": "Hop Pipeline Location",
                "name": "hopPipelineLocation",
                "regexes": [
                    ".*"
                ]
            }
        ]
    },
    "sdkInfo": {
        "language": "JAVA"
    }
}
```

* Important\
  You can change the docker image used in the metadata file

### Creating a Dataflow pipeline

Now we can go back to the [console](https://console.cloud.google.com/dataflow/pipelines) and "Create data pipeline".

When selecting the Beam Flex template metadata file you will notice required parameters showing up. You can then add the path to the Hop metadata and Hop pipeline stored in cloud storage.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.primeur.com/data-shaper-1.21/knowing-the-data-shaper-designer/pipelines/getting-started-with-apache-beam/running-the-beam-samples/google-dataflow-pipeline.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
