# Group By

## ![](/files/zKSuZNpuD6tlR9AZCA62) Group By

### Description <a href="#description" id="description"></a>

The Group By transform groups rows from a source, based on a specified field or collection of fields. A new row is generated for each group.

It can also generate one or more aggregate values for the groups.

Common uses are calculating the average sales per product and counting the number of an item you have in stock.

The Group By transform is designed for sorted inputs.

If your input is not sorted, only double consecutive rows are grouped correctly.

If you sort the data outside of Hop, the case sensitivity of the data in the fields may produce unexpected grouping results.

You can use the [Memory Group By](/data-shaper-1.21/knowing-the-data-shaper-designer/pipelines/transforms/memgroupby.md) transform to handle non-sorted input.

| Hop Engine | <sup>✓</sup> |
| ---------- | ------------ |
| Spark      | <sup>✓</sup> |
| Flink      | <sup>✓</sup> |
| Dataflow   | <sup>✓</sup> |

### Options

| Option                                 | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Transform name                         | Name of the transform.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| Include all rows?                      | Enable if you want all rows in the output, not just the aggregation; to differentiate between the two types of rows in the output, a flag is required in the output. You must specify the name of the flag field in that case (the type is boolean).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| Temporary files directory              | The directory in which the temporary files are stored (needed when the Include all rows option is enabled and the number or grouped rows exceed 5000 rows); the default is the standard temporary directory for the system                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| TMP-file prefix                        | Specify the file prefix used when naming temporary files                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| Add line number, restart in each group | Enable to add a line number that restarts at 1 in each group                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| Line number field name                 | The name of the field added to contain the line numbers.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| Always give back a row                 | If you enable this option, the Group By transform will always give back a result row, even if there is no input row. This can be useful if you want to count the number of rows. Without this option you would never get a count of zero (0).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| Group fields table                     | Specify the fields over which you want to group. Click Get Fields to add all fields from the input stream(s).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| Aggregates table                       | <p>Specify the fields that must be aggregated, the method and the name of the resulting new field. Here are the available aggregation methods :</p><ul><li>Sum</li><li>Average (Mean)</li><li>Median</li><li>Percentile</li><li>Minimum</li><li>Maximum</li><li>Number of values (N)</li><li>Concatenate strings separated by , (comma)</li><li>First non-null value</li><li>Last non-null value</li><li>First value (including null)</li><li>Last value (including null)</li><li>Cumulative sum (all rows option only!)</li><li>Cumulative average (all rows option only!)</li><li>Standard deviation (population)</li><li>Concatenate strings separated by \<Value>: specify the separator in the Value column (This supports <a href="/pages/1kBEGq2wnGgIbriSGex5#hexadecimal-values">hexadecimals</a>)</li><li>Number of distinct values</li><li>Number of rows (without field argument)</li><li>Standard deviation (sample)</li><li>Percentile (nearest-rank method)</li><li>Concatenate string separated by new line (CRLF)</li><li>Concatenate distinct values separated by \<Value>: specify the separator in the Value column (This supports <a href="/pages/1kBEGq2wnGgIbriSGex5#hexadecimal-values">hexadecimals</a>)</li></ul> |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.primeur.com/data-shaper-1.21/knowing-the-data-shaper-designer/pipelines/transforms/groupby.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
