Edge Memory Allocation
Manipulating large volumes of data in a single record is always an issue. In Data Shaper Designer, sending big data along graph edges means that:
-
Whenever there is a need to carry many MBs of data between two components in a single record, the edge connecting them expands its capacity. This is referred to as dynamic memory allocation.
-
If you have a complicated transformation scenario with some sections transferring huge data, only the edges in these sections will use dynamic memory allocation. The other edges retain low memory requirements.
-
An edge which has carried a big record before and allocated more memory for itself will not reduce its size back again. It consumes bigger amount of memory till your graph execution is finished.
By default, the maximum size of a record sent along an edge is 268,435,456 bytes (256 MB). This value can be increased, theoretically, up to GBs by setting the Record.RECORD_LIMIT_SIZE
property, see Engine Configuration. Record.FIELD_LIMIT_SIZE
can also be 268,435,456 bytes (256 MB), by default. All fields in total cannot use more memory than Record.RECORD_LIMIT_SIZE
.
There is no harm in increasing Record.RECORD_LIMIT_SIZE
to whatever size you want. The only reason for keeping it smaller is an early error detection. For instance, if you start appending to a string field and forget to reset record (after each record), the field size can break the limits.
Let us look a little deeper into what happens in the memory. Initially, a record starts with 65,536 (64kB) of memory allocated to it. If there is a need to transfer huge data, its size can dynamically grow up to the value ofRecord.RECORD_LIMIT_SIZE
. Therefore, the amount of memory a record can consume is between65,536
(64k
) andRecord.RECORD_LIMIT_SIZE
.
In your graph, edges which are more 'memory greedy' look like regular edges. They have no visual distinction.
Measuring and Estimating Edge Memory Demands
To estimate how memory-greedy your graph is even before executing it, consult the table below (note: computations are simplified). In general, a graph’s memory demands depend on the input data, components used and edge types. In this place, we contribute to understanding the last one. See approximately how much memory your graph takes before its execution and to what extent memory demands can rise.
The following table depicts memory demands for particular edge types in MB and in the multiples of record initial size and record limit size. The limits can be raised if necessary.
EDGE TYPE | INITIAL SIZE | MULTIPLE OF RIS [1] | MAXIMUM SIZE | MULTIPLE OF RLS [2] |
---|---|---|---|---|
Direct | 589,824 B (576 kB) | 9 RIS | 805,306,368 B (768 MB)[3] | 3 RLS |
Buffered | 1,376,256 B (1344 kB) | 21 RIS | 805,306,368 B (768 MB)[3] | 3 RLS |
Phase | 131,072 B (128 kB) | 2 RIS | 536,870,912 B (512 MB)[3] | 2 RLS |
Direct Fast Propagate | 262,144 B (256 kB) | 4 RIS[4] | 1,073,741,824 B (1024 MB)[3] | 4 RLS |
[1] RIS = Record.RECORD_INITIAL_SIZE
= 65,536 (by default)
[2] RLS = Record.RECORD_LIMIT_SIZE
= 268,435,456 (by default)
[3] The size depends on RECORD_LIMIT_SIZE. It can be changed, see Engine Configuration.
[4] The number 4 is the number of buffers and it can be changed. In general, buffers' memory can rise up to RLS * (number of buffers)
Updated 4 months ago