# Apache Tika

## <img src="/files/nKmpNZ7nMsLxJt1j9Dte" alt="" data-size="line"> Apache Tika

### Description <a href="#description" id="description"></a>

The Apache Tika transform parses files in all sorts of formats and extracts the text content as well as available metadata it can extract. This transform uses the [Apache Tika](http://tika.apache.org) libraries to do the parsing.

The extracted metadata is given in JSON format. If you need specific pieces of information from this metadata, you can extract those with a [JSON Input](/data-shaper-1.21/knowing-the-data-shaper-designer/pipelines/transforms/jsoninput.md) transform.

### Supported Engines <a href="#supported_engines" id="supported_engines"></a>

| Hop Engine | <sup>✓</sup> |
| ---------- | ------------ |
| Spark      | ?            |
| Flink      | ?            |
| Dataflow   | ?            |

### Options

| Option            | Description                                                                                                     |
| ----------------- | --------------------------------------------------------------------------------------------------------------- |
| Transform name    | Name of the transform. Note: This name has to be unique in a single pipeline.                                   |
| File tab          | Here you can specify which files will be read and examined.                                                     |
| Content tab       | This tab has various content settings. For example, you can specify the file encoding, output format and so on. |
| Output fields tab | On this tab you can simply type in the names of the fields you want in the output.                              |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.primeur.com/data-shaper-1.21/knowing-the-data-shaper-designer/pipelines/transforms/apache-tika.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
