# Reservoir Sampling

## <img src="/files/IJKc2uLo7cU0ydnVJ6O3" alt="" data-size="line"> Reservoir Sampling

### Description <a href="#description" id="description"></a>

The Reservoir Sampling transform allows you to sample a fixed number of rows from an incoming data stream when the total number of incoming rows is not known in advance.

The transform uses uniform sampling; all incoming rows have an equal chance of being selected.

This transform is particularly useful when used in conjunction with the ARFF output transform in order to generate a suitable sized data set to be used by WEKA.

The reservoir sampling transform uses [Algorithm R](https://en.wikipedia.org/wiki/Reservoir_sampling) by Jeffery Vitter.

| Hop Engine | <sup>✓</sup> |
| ---------- | ------------ |
| Spark      | ?            |
| Flink      | ?            |
| Dataflow   | ?            |

### Options

| Option         | Description                                                                                                                                                    |
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Transform name | Name of the transform this name has to be unique in a single pipeline.                                                                                         |
| Sample size    | Select how many rows to sample from an incoming stream. Setting a value of 0 will cause all rows to be sampled; setting a negative value will block all rows.  |
| Random seed    | Choose a seed for the random number generator. Repeating a pipeline with a different value for the seed will result in a different random sample being chosen. |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.primeur.com/data-shaper-1.21/knowing-the-data-shaper-designer/pipelines/transforms/reservoirsampling.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
