High Availability

Data Shaper Server does not recognize any differences between Cluster nodes. Thus, there are no "master" or "slave" nodes meaning all nodes can be virtually equal. There is no single point of failure (SPOF) in the Data Shaper Cluster itself; however, SPOFs may be in input data or some other external element.

Clustering offers high availability (HA) for all features accessible through HTTP. Regarding the HTTP accessible features: it includes sandbox browsing, and primarily job executions. Any Cluster node may accept incoming HTTP requests and process them itself or delegate it to another node.

Requests processed by any Cluster node

  • Job files, metadata files, etc. in shared sandboxes
    All job files, metadata files, etc. located in shared sandboxes are accessible to all nodes. A shared filesystem may be a SPOF, so it is recommended to use a replicated filesystem instead.

  • Database requests
    In Cluster, a database is shared by all Cluster nodes. Again, a shared database might be a SPOF, however it may be Clustered as well.

However, there is a possibility that a node itself cannot process a request (see below). In such cases, it completely and transparently delegates the request to a node which can process the request.

Requests limited to specific node(s)

  • A request for the content of a partitioned or local sandbox
    These sandboxes aren’t shared among all Cluster nodes. Note that this request may come to any Cluster node which then delegates it transparently to a target node; however, this target node must be up and running.

  • A job configured to use a partitioned or local sandbox
    These jobs need nodes which have a physical access to the required partitioned or local sandbox.

  • A job with allocation specified by specific Cluster nodes
    Concept of allocation is described in the following sections.

In the cases above, inaccessible Cluster nodes may cause a failure of the request; So it is recommended to avoid using specific Cluster nodes or resources accessible only by specific Cluster node.

Load Balancer

Data Shaper itself implements a load balancer for executing jobs. So a job which isn’t configured for some specific node(s) may be executed anywhere in the Cluster and the Data Shaper load balancer decides, according to the request and current load, which node will process the job. All this is done transparently for the client side.

To achieve HA, it is recommended to use an independent HTTP load balancer. Independent HTTP load balancers allow transparent fail-overs for HTTP requests. They send requests to the nodes which are running.