Modifying GoshawkDB Clusters
GoshawkDB clusters can be reconfigured: adding or removing
nodes, adding or removing client accounts, and changing
the F
configuration parameter. Currently in
GoshawkDB 0.3.1, the MaxRMCount
configuration
parameter can not be
changed. The configuration
guide documents these parameters. The cluster
configuration can be changed on a live running GoshawkDB
cluster without needing to stop and restart nodes.
To make a change to the cluster configuration, update the
configuration file, making sure you increment
the Version
field. Then
send SIGHUP
to the node which was started
with that configuration file. The node will reread the
configuration file, will verify that it's valid and if so,
will communicate the change to the other nodes in the
cluster and begin reconfiguration of the cluster.
When changing the nodes within the cluster, there are three types of nodes: new nodes - these are nodes that were not in the old configuration but are in the new configuration; removed nodes - these are nodes that were in the old configuration but are not in the new configuration; surviving nodes - these are nodes that are in both the old configuration and the new configuration.
- If you send
SIGHUP
to a node then that node must have been started with the-config
parameter otherwise the node will not know which configuration file to (re)load. Alternatively, you can stop a node and restart it, supplying the updated configuration file on the command line. - You only need to inform one node of the new configuration and the nodes will automatically communicate the configuration change amongst themselves. If you're adding nodes, it's fine to provide the configuration only to the new nodes. They will then contact surviving nodes and the configuration change will be able to progress.
- Configuration changes can be made when there are
failed nodes. This is in fact essential to be able to
remove failed nodes from the cluster or to replace
them. However, configuration changes can only occur if
no more than
F
nodes are failed (unreachable). - It is your responsibility to make sure that there are
never multiple different configurations with the same
Version
number. Behaviour of GoshawkDB is undefined if you simultaneously apply different new configurations with the sameVersion
number to different nodes of the same cluster. - The configuration
Version
field must always increase. This includes the scenario when a node which has failed is being replaced without any other change to the configuration. For example, consider a cluster which includes hostfoo
; the cluster is formed and is working; then hostfoo
fails in some way. It then gets rebuilt and is assigned the same host name. When joining the newfoo
back into the cluster, theVersion
field must still be increased otherwise the rest of the cluster will not detect thatfoo
is now empty and needs repopulating. - When replacing multiple failed nodes within a cluster
without stopping the rest of the cluster, expect to
reintroduce the failed nodes one at a time: it's
unlikely you'll be able to start multiple replacements
at the same time such that the surviving nodes realise
multiple failed nodes have been replaced. (There is no
indication in the configuration file which node(s) have
failed and been rebuilt, so the surviving nodes have to
determine this for themselves. Once the surviving nodes
have identified any single node which needs to resync
into the cluster, it'll start doing that (assuming there
are no more than
F
failures currently). Hence it's unlikely you'll be able to get the cluster to identify multiple rebuilt nodes at the same time). Alternatively, stop the entire cluster, start all the replaced nodes first, and then bring back up the surviving nodes; with this approach, the surviving nodes will learn of all the replacements at the same time and so will be able to make the configuration change in one pass. - You can change the configuration whilst a
configuration change is taking place. So if you realise
a configuration change cannot complete (for example, due
to too many failed nodes), you can provide an even newer
cluster configuration via the normal routes
(either
SIGHUP
or restarting a node). Provided theVersion
field is increased again, the cluster will abandon any change in progress and will start changing to the new configuration. - During configuration changes, clients will remain connected to surviving nodes but their transactions will not be able to progress until after the configuration change has succeeded.
- If a configuration change removes a client certificate fingerprint then once the configuration change is complete, any client that authenticated with such a fingerprint has its connection closed.
- If a configuration change modifies which roots a client account can access, or modifies the capabilities granted on those roots for a client account, then any connection using that client account will be disconnected as part of the reconfiguration.