A distributed, transactional,
fault-tolerant object store

Configuration

GoshawkDB has very few configuration options and so configuration is hopefully simple and easy. The configuration file defines the topology of the cluster and the client accounts that can access the cluster. The same configuration file must be provided to all nodes within the cluster when each node is started for the first time. Once a node has started it stores a copy of the configuration file. On subsequent starts, the configuration file does not need to be provided on the command line: GoshawkDB will use the copy it has stored. When two nodes connect to each other, they verify that they are using the same configuration. GoshawkDB thoroughly verifies the configuration when it is loaded and will inform you of any issues it finds.

The configuration file is a JSON object, with the following fields:

  • ClusterId (string). This names the topology; any string is acceptable. The purpose is to name-space clusters: if two nodes from different clusters accidentally connect to each other, they will discover different ClusterIds and disconnect from each other.
  • Version (integer). This versions the configuration. When changing the configuration, this field must be incremented, otherwise GoshawkDB will ignore the changed configuration.
  • F (integer). This defines the number of failures that GoshawkDB is able to withstand. A failure is the inability of one node to successfully talk to another node. Provided that no more than F nodes are unreachable, GoshawkDB will continue to operate normally. When more than F nodes become unreachable, a transaction submitted to a server node may either block or return an error until sufficient nodes recover. To be able to withstand F failures, the minimum number of nodes in the cluster is 2*F + 1.
  • Hosts (list of strings). This defines the hosts that form the cluster. There must be at least 2*F + 1 entries in the list. Each entry can be "host:port". If there is no port then the default port of 7894 is used. The host may be specified as a hostname, or FQDN, or IP address. Note that each node must be able to identify itself in the host list, and all addresses must be resolvable and reachable from every host in the cluster.
  • MaxRMCount (integer). This defines the maximum number of RMs (or nodes) in the cluster. Setting this value much higher than the size of the cluster will waste disk space and network bandwidth. In the future it will be possible to change this value as necessary, but for GoshawkDB 0.3.1 configuration changes do not allow this field to be modified. It is currently recommended to set this field to the maximum number of nodes you expect your cluster to grow to (and then perhaps double that, to be on the safe side!). When changing this field becomes possible, it will be a reasonably expensive operation (every object stored will need to be modified).
  • ClientCertificateFingerprints (object). This defines fingerprints (SHA-256) of client certificates that will be allowed to connect to the cluster. It also defines which root objects each client account will be able to access, and what capabilities they will have on each root. The easiest way to create certificates is to use the -gen-client-cert command line parameter. The output from that command includes the fingerprint which should be copied to this configuration element. You can define as many or as few roots (at least 1) for each account, and if multiple accounts have a root by the same name, it is the same root: i.e. those accounts have access to the same root. The Read and Write booleans determine the capabilities granted on each root for the client account.

And that's it. Small and simple. A complete configuration file could look like:

{
    "ClusterId": "mycluster",
    "Version": 1,
    "F": 2,
    "Hosts": [
        "host1.example.com",
        "host2.example.com",
        "host3.example.com:8888",
        "host3.example.com:8889",
        "host4.example.com"
    ],
    "MaxRMCount": 10,
    "ClientCertificateFingerprints": {
        "34b1c780a4756a5cbc4c69a111188f641eed20aa825cb0b2fecd31cf28d643cb": {
            "sharedRoot1": {
                "Read": true,
                "Write": false
            },
            "theOtherRoot": {
                "Read": true,
                "Write": true
            }
        },
        "6c5b2b2efc0ef77248af64cda16445fdfe936c9f5484711d77c9d67bba5dfe44": {
            "sharedRoot1": {
                "Read": true,
                "Write": true
            }
        }
    }
}

Command line parameters

Four command line parameters will cause GoshawkDB to start and stop immediately:

  • -version
    Display the current version of GoshawkDB
  • -h
    Display help for the command line parameters.
  • -gen-cluster-cert
    Generate a new cluster certificate and key pair. The certificate and key pair are written to stdout in PEM format, so the output can easily be redirected to a file. The path to the generated certificate and key pair can be used as the value to the -cert parameter (see below).
  • -gen-client-cert
    Generate a new client certificate and key pair. The -cert parameter is required (see below). The generated certificate and key pair are written to stdout in PEM format, whilst the certificate fingerprint is written to stderr, so the certificate and key pair can be easily redirected to a file. The fingerprint can be added to the ClientCertificateFingerprints element of the configuration file to allow clients to authenticate using the generated certificate and key pair.

There are four further command line parameters for running the server:

  • -config /path/to/config.json
    The path to the configuration file documented above. This only needs to be provided the first time the node is started in the cluster, though it causes no problems to always be provided.
  • -cert /path/to/clusterCert.pem
    The path to the cluster certificate and key pair in PEM format as generated by the -gen-cluster-cert parameter. Note that this parameter is also required by the -gen-client-cert parameter (see above). The certificate and key pair are loaded from the file once at startup.
  • -dir /path/to/directory
    The path to the directory in which GoshawkDB should write data. GoshawkDB will create the directory if it doesn't exist. All data is saved within this directory.
  • -port 9999
    Defines the port on which GoshawkDB will listen for connections. GoshawkDB uses the same port for both client-to-server and server-to-server connections. If this parameter is not provided then default port of 7894 is used.