Data One runtime services, nodes, and components
Last updated
Last updated
DOIM is the only part of the Data One product that is not operating in a clustered mode, i.e., only one such node is required to perform the required actions through DOIM CLI.
The picture below describes the two deployment options for control node designation and, consequently, for DOIM deployment.
Either option is acceptable, and even in case Option 1 is adopted, the control node does not need a dedicated physical or virtual server, just a dedicated O.S. user, and enough disk space is sufficient.
When Option 2 is adopted, it is required that DOIM and the actual Data One product are run by two different users. This is required to keep distinct operational contexts even if the physical or virtual server is shared between the two users.
The main non-functional traits of DOIM and the associated control node are:
the node must be regularly backed up as it stores vital domain state information including:
installable product base image
installable product updates and fixes images
master copy and version history of DMCFG
enough free disk space to store additional product updates and fixes as they are released over time
CEMAN is a mandatory platform component that oversees all data integration command and control tasks, including the choreography and orchestration of actions on one or more STENGs nodes. Moreover, CEMAN hosts the platform-wide Data One administrative WUI.
A CEMAN node embeds three main components that are typically not accessed directly by users and administrators:
CEMAN-Core: the core component serving core APIs, administrative WUI, and supporting core data integration orchestrations and choreographies
AMQ Brokers: Artemis ActiveMQ brokers underpin all the core communications between distributed Data One components
IAM Server: core Keycloak-based server providing Identity and Access Management (IAM) services to all distributed Data One components
CEMAN operates in clustered mode for both scalability and availability purposes. One of the key deployment decisions is how many instances of CEMAN to deploy; a typical number covering most production scenarios is 3 CEMAN nodes (3 being the smallest odd number greater than 1).
On each CEMAN cluster node, a single instance of all embedded components (namely, CEMAN-Core, AMQ Brokers, and IAM Server) will be running.
The main non-functional traits of CEMAN are:
need of a high speed connection with the Data One database
need of a fast shared file system across all cluster nodes, to hold AMQ Brokers message store and other critical state information
need of sufficient RAM and CPU resources to cope with workload peaks
Data Watcher is an optional platform module in charge of storing and managing all events emitted by the platform as well as other 3rd-party monitored systems to perform end-to-end dataflow monitoring.
A Data Watcher node running Data Watcher Engine embeds three main components that are typically not accessed directly by users and administrators:
MongoDB: an embedded document-oriented database, mainly used as an event store.
Apache Storm and its prerequisite Apache Zookeeper: used to process streams of events emitted by the platform or other 3rd-party monitored systems.
Data Watcher operates in clustered mode for both scalability and availability purposes. One of the key deployment decisions is how many instances of Data Watcher to deploy and where to deploy them.
A typical number covering most production scenarios is 3 Data Watcher nodes (3 being the smallest odd number greater than 1) deployed on the same nodes hosting CEMAN clustered instances.
On each Data Watcher cluster node, a single instance of all embedded components (namely, MongoDB, Apache Storm, and Apache Zookeeper) will be running.
The main non-functional traits of Data Watcher are:
need of a fast local file system, to hold MongoDB event store, with enough free space to cope with the expected events volumes and required event retention
need of sufficient RAM and CPU resources to cope with event peaks
STENG is a mandatory platform component in charge of performing executive data integration tasks choreographed or orchestrated by CEMAN.
STENG operates in clustered mode for both scalability and availability purposes.
Managed file transfer (MFT) tasks are an example of the tasks performed by STENG.
A member of a STENG cluster is also referred to as STENG Peer or simply Peer.
One of the key deployment decisions is how many STENG Peers to deploy and where to deploy them. A typical number covering most production scenarios is 2 to 5 STENG Peers deployed on dedicated nodes, selected in a way that optimizes access to the user data that must be locally processed (e.g. the user file system where files must be read/written) or remotely processed (e.g. the remote servers accessed via SFTP or other protocols, where files must be read/written).
The main non-functional traits of STENG are:
STENG is mainly network-bound and I/O-bound, and just moderately CPU-bound
STENG requires high performance access to local and remote data producing systems and data consuming systems
STENG needs sufficient RAM and CPU resources to cope with workload peaks
DMZ Gateway is an optional platform component that enables the deployment of multi-tiered MFT architectures where:
business files are streamed through the DMZ without actually being staged on DMZ storage at any time
business data integration configurations are dynamically injected into DMZ at runtime, without being persisted on DMZ storage at any time
the firewall between DMZ and the intranet is only traversed in the intranet-to-DMZ direction for both incoming and outgoing file transfer protocol connections, using secure session tunneling techniques
DMZ Gateway operates in clustered mode for availability purposes. One of the key deployment decisions is how many instances of DMZ Gateway to deploy and where to deploy them. A typical number covering most production scenarios is 2 DMZ Gateway nodes (an active one and a failover one) for each DMZ segment, where a STENG node needs to expose its services through DMZ Gateway.
DMZ Gateway can be seen as a "logical proxy" to STENG for MFT scenarios where secure one-way DMZ traversal is required.
The main non-functional traits of DMZ Gateway are:
DMZ Gateway totally network-bound
DMZ Gateway footprint on DMZ storage is negligible as no business data or business configurations are ever persisted on it
Data Shaper is the data transformation solution of the Data One platform. It provides quick and flexible any-to-any data transformations.
Data Shaper operates in clustered mode for both scalability and availability purposes. One of the key deployment decisions is how many instances of Data Shaper to deploy.
The deploy location of the Data Shaper is predetermined by the existing deployment of STENG. The Data Shaper Engine can be exclusively installed on nodes that already host a STENG Peer. Depending on the expected load, the Data Shaper Engine can be installed on all STENG Peers or just a subset of them.
The main non-functional traits of Data Shaper are:
Data Shaper is typically I/O-bound, due to the need of reading and writing the data to be transformed
Depending on the types of transformations that must be performed, Data Shaper can also be CPU-bound
availability of a specific version of Ansible and python (see for more details), that are used to orchestrate commands across different domain nodes