Failover concept

The failover concept is about ensuring the availability of the Node Controller.

If the connection of any Working Node to the Node Controller is interrupted, the failover is triggered. Note: See also section Graceful Shutdown of the Node Controller below.

If this Working Node can still reach the externalURL (see parameter below), then it becomes the new Node Controller and all remaining Working Nodes connect to it after the new Node Controller informs them of its new role.

If this Working Node cannot reach the externalURL, then the next Node in the priority hierarchy becomes the new Node Controller.

Configuration


Each Lobster_data instance knows both the Internet address and the failover priority of all other Lobster_data instances. This ensures that each instance can autonomously reorganise itself in a failover case. This information is read from the configuration files ./etc/startup.xml and ./etc/admin/datawizard/lb_nodes.properties when a Lobster_data instance is started. The relevant content of these files is replicated by the Node Controller to the Working Nodes when they log into the load balancing network (i.e. you only have to maintain the files on the Node Controller).

File "startup.xml"

To activate the failover mechanism, the following entry in configuration file ./etc/startup.xml has to exist.


<Call name="enableFailOver">
<Arg>
<New class="com.ebd.hub.datawizard.app.loadbalance.failover.Configuration">
<Set name="port">2320</Set>
<Set name="heartbeat">500</Set>
<Set name="externalURL">https://www.google.de</Set>
<Set name="maxPingRetry">5</Set>
<Set name="timeout">1200</Set>
</New>
</Arg>
</Call>


Following the meaning of the parameters. A regular ping (heartbeat) is executed from the Working Nodes to the Node Controller. If an error occurs, a certain number of further pings (maxPingRetry) are executed again with a certain waiting time (timeout) in between. If these pings also fail, the failover is triggered.


Parameter

Description

port

The port on which the 'ping' messages are exchanged.

heartbeat

Sets the frequency of the pings in milliseconds.

externalURL

Must be a reachable HTTP(S) URL that allows the Working Nodes to verify whether they are disconnected from the network or the Node Controller. A Working Node can only become the new Node Controller if it can reach the externalURL.

maxPingRetry

Number of further ping attempts after a ping failure until failover is triggered in case of further failures. Default: 3 .

timeout

The waiting time used for ping attempts after a ping error (maxPingRetry) instead of the normal ping frequency (heartbeat).

File "lb_nodes.properties"

The configuration file ./etc/admin/datawizard/lb_nodes.properties contains the respective host address and the MessageService port of the Lobster_data instances (Working Nodes and Node Controllers) involved in the load balancing. The key is the name of the instance as specified in the respective configuration file ./etc/factory.xml in element id. Note: See also section Structure of a Properties File (note the backslash before the colon in the following example file).


# define all working nodes by IP:Port that are licensed - must be the factory name as key
#
# e.g.
WorkNode2=192.168.132.56:8020
WorkNode1=192.168.132.55:8020
NodeContr2=192.168.132.54:8020
NodeContr1=192.168.132.53:8020
WorkNode3=192.168.132.57:8020

Functional principle


  • There is always exactly one active Node Controller.

  • The number of Working Nodes is unlimited and can be changed at any time.

  • In principle, each Working Node (by license and configuration) can be in working mode as well as in controller mode. The modes can change during operation.

  • Node Controllers (by license and configuration) can only be in controller mode.

  • Always the last Node Controller that goes online is the active Node Controller. Old Node Controllers shut down unless they previously were a Working Node by license and configuration. In this case, it changes back to a Working Node and does not shut down.

  • Changing the operation mode of a node can occur automatically due to a detected failover, or it can be explicitly initiated by the user (in the Control Center or via HTTP).

  • A valid operating state with a Node Controller and Working Node(s) must be reached at least once (at startup), because stand-alone Working Nodes receive their setup from the Node Controller at startup.

Deactivating SAP RequestListener


See section SAP RequestListener in Load Balance Failover.

Failure of the primary DMZ server


See section DMZ Cluster.

Graceful shutdown of the Node Controller


If you shut down the Node Controller regularly yourself, a signal is normally sent to the Working Nodes so that they do not interpret the shutdown as a failure.

However, if you want that this signal is not sent and thus a failover is triggered, you can create the file "nc_suppress_shutdown" in the installation directory of the Node Controller.

This file is not deleted. This file only works on the configured Node Controller and not on a Working Node that has assumed the Node Controller role.