Failover Concept
The failover concept is about ensuring the availability of the Node Controller.
If the connection of any Working Node to the Node Controller is interrupted, the failover is triggered. Note: See also section Graceful Shutdown of the Node Controller below.
If this Working Node can still reach the externalURL (see parameter below), then it becomes the new Node Controller and all remaining Working Nodes connect to it after the new Node Controller informs them of its new role.
If this Working Node cannot reach the externalURL, then the next Node in the priority hierarchy becomes the new Node Controller.
Configuration
Each Lobster_data instance knows both the Internet address and the failover priority of all other Lobster_data instances. This ensures that each instance can autonomously reorganise itself in a failover case. This information is read from the configuration files ./etc/startup.xml and ./etc/admin/datawizard/lb_nodes.properties when a Lobster_data instance is started. The relevant content of these files is replicated by the Node Controller to the Working Nodes when they log into the load balancing network (i.e. you only have to maintain the files on the Node Controller).
File startup.xml
To activate the failover mechanism, the following entry in configuration file ./etc/startup.xml has to exist.
<
Call
name
=
"enableFailOver"
>
<
Arg
>
<
New
class
=
"com.ebd.hub.datawizard.app.loadbalance.failover.Configuration"
>
<
Set
name
=
"port"
>2320</
Set
>
<
Set
name
=
"heartbeat"
>500</
Set
>
<
Set
name
=
"externalURL"
>
https://www.google.de
</
Set
>
<
Set
name
=
"maxPingRetry"
>5</
Set
>
<
Set
name
=
"timeout"
>1200</
Set
>
</
New
>
</
Arg
>
</
Call
>
Following the meaning of the parameters. A regular ping (heartbeat) is executed from the Working Nodes to the Node Controller. If an error occurs, a certain number of further pings (maxPingRetry) are executed again with a certain waiting time (timeout) in between. If these pings also fail, the failover is triggered.
Parameter |
Description |
port |
The port on which the 'ping' messages are exchanged. |
heartbeat |
Sets the frequency of the pings in milliseconds. |
externalURL |
Must be a reachable HTTP(S) URL that allows the Working Nodes to verify whether they are disconnected from the network or the Node Controller. A Working Node can only become the new Node Controller if it can reach the externalURL. |
maxPingRetry |
Number of further ping attempts after a ping failure until failover is triggered in case of further failures. Default: 3 . |
timeout |
The waiting time used for ping attempts after a ping error (maxPingRetry) instead of the normal ping frequency (heartbeat). |
File lb_nodes.properties
The configuration file ./etc/admin/datawizard/lb_nodes.properties contains the respective host address and the MessageService port of the Lobster_data instances (Working Nodes and Node Controllers) involved in the load balancing. The key is the name of the instance as specified in the respective configuration file ./etc/factory.xml in element id. Note: See also section Structure of a Properties File (note the backslash before the colon in the following example file).
# define all working nodes by IP:Port that are licensed - must be the factory name as key
#
# e.g.
WorkNode2=192.168.132.56:8020
WorkNode1=192.168.132.55:8020
NodeContr2=192.168.132.54:8020
NodeContr1=192.168.132.53:8020
WorkNode3=192.168.132.57:8020
Functional Principle
There is always exactly one active Node Controller.
The number of Working Nodes is unlimited and can be changed at any time.
In principle, each Working Node (by license and configuration) can be in working mode as well as in controller mode. The modes can change during operation.
Node Controllers (by license and configuration) can only be in controller mode.
Always the last Node Controller that goes online is the active Node Controller. Old Node Controllers shut down unless they previously were a Working Node by license and configuration. In this case, it changes back to a Working Node and does not shut down.
Changing the operation mode of a node can occur automatically due to a detected failover, or it can be explicitly initiated by the user (in the Control Center or via HTTP).
A valid operating state with a Node Controller and Working Node(s) must be reached at least once (at startup), because stand-alone Working Nodes receive their setup from the Node Controller at startup.
Graceful Shutdown of the Node Controller
If you shut down the Node Controller regularly yourself, a signal is normally sent to the Working Nodes so that they do not interpret the shutdown as a failure.
However, if you want that this signal is not sent and thus a failover is triggered, you can create the file nc_suppress_shutdown in the installation directory of the Node Controller.
This file is not deleted. This file only works on the configured Node Controller and not on a Working Node that has assumed the Node Controller role.