Graph tracking

The Data Shaper engine provides various tracking information about running graphs. The most important information is used to populate the Tracking view, located on bottom of the Data Shaper perspective (see the designer’s tabs).

The same source of data is used for displaying decorations on graph elements. The number of transferred records appears along the edges of a running graph. The phase edges have two numbers, the left end of the edge shows the number of data records sent to the edge, and the right end of the edge shows the number of data records already read from the phase edge.

600

If the graph is running in the Data Shaper Cluster environment with a multi-worker allocation, the in-graph tracking information can go into even more detail. Each component displays the number of instances of the component, i.e. parallel executions. Tracking information on edges is available in three levels of detail - low, medium and high. The level can be changed in Window > Preference > Data Shaper > Tracking page. Or press 'D' to iterate over all levels of tracking details directly in the graph editor.

  • The low level of tracking detail shows the total number of transferred records over all workers/partitions.
  • The medium level shows the total number of transferred records as well as additional drill down information - the number of passed records and skew for each processing partition.
807

The example above shows a simple Clustered graph with a medium level of tracking information. The DataGenerator component is executed on a single worker, so the interconnecting edge is decorated only by the total number of transferred records. The label above this edge shows that 30% of the data records are sent to one instance of SimpleCopy and 34% to the other two instances.

  • The high level shows the most detailed information - the number of transferred records and Cluster node names where the partition is running (for example 'node1: 250 123'). Partitions where the edge is remote, the source Cluster node and target Cluster node are shown (for example 'node1~node2: 250 123').