A distributed, transactional,
fault-tolerant object store

Features in depth

  • GoshawkDB is an Object Store

    This means that it stores any arbitrary object graph you wish, and doesn't have terms like "table", "row", or "column", only "object". As a result, it is much closer to how objects are used in programming languages and so rather than have a step in your programs where you convert rows across various tables to and from objects, you instead run transactions in which you modify your GoshawkDB-managed objects.

    Any object graph is acceptable: there are no limitations on cycles or aliasing so you should not have to change the way you model your data to accommodate GoshawkDB. The only requirement that GoshawkDB makes is that you must explicitly declare pointers (or references) to other objects: if object x contains a pointer to object y then you need to let GoshawkDB know that.

    Navigation of the object graph, and thus resource discovery, is achieved through traversing from root objects which are created by GoshawkDB when it first starts. GoshawkDB can be configured to have multiple distinct roots and different client accounts can be configured to have access to different roots. When a client connects to your GoshawkDB cluster, it is told about which root objects it can access, and navigation proceeds from there. It is perfectly legal to have an object which can be reached from just one or more roots. Through this mechanism, you can control which client accounts can access which objects.

    In addition to this flexibility and security mechanism, GoshawkDB also supports Object Capabilities. With Object Capabilities, a pointer (or reference) to an object not only contains the unique Id of the object, but also carries a capability which indicates which actions can be performed on the object. This can restrict a client, for example, to only being able to read an object: if the client attempts to write to the object then the transaction will be aborted by the server. GoshawkDB also ensures that an object that cannot be read by a client will never have its value sent to that client. I.e. this security mechanism does not rely on trusting clients not to misbehave.

    Object values are plain byte arrays: GoshawkDB imposes no schema on you and never needs to interpret object values. This means you're free to use whatever object serialization format you want: from JSON to ProtoBuf to CapnProto and everything else too. Thus to make your objects multi-language, you only need to pick a serialization format for which there are bindings to the languages you want to use (and there are GoshawkDB clients available). This also means that you are free to encrypt object values as GoshawkDB never needs to inspect them: they are completely opaque to GoshawkDB.

  • GoshawkDB is fully transactional with strong-serializability

    Over the past decade there have been many data stores built that prioritise speed and availability (eventual consistency and similar). These data stores can operate very fast and they can scale extremely well - near linearly. But they're often difficult to use and reason about because their semantics can be unintuitive or unclear, especially once failures start to occur. Where these data stores do offer transactions, they're often very limited (frequently described as light-weight): the transaction is atomic if it only modifies a single row of the data store.

    GoshawkDB supports full transactions which are atomic no matter how many objects they touch. Transactions have many nice properties and using full transactions makes your code simpler and easier to write and maintain. GoshawkDB's transactions are also durable: the client is not informed of the result of the transaction until after the result has been written and flushed to disk.

    Many traditional databases, such as PostgreSQL, support several different isolation levels and there are dozens of different isolation levels possible. Arguably the simplest to understand is strong-serializability which requires that the database behaviour is indistinguishable from a database which performs one transaction at a time, and starts each transaction with the database state at the point achieved by the last committed transaction. This is the simplest isolation level to think about and reason about. Other isolation levels are all weaker: they permit certain interleavings of events that strong-serializability would prohibit.

    Traditional databases tend not to have their default isolation level set to strong-serializability because of the belief that strong-serializability would be too slow: it is the most restrictive from the point of view of the database and requires the most amount of coordination between concurrent transactions. However, GoshawkDB has a radically different architecture to most databases and its object-store design means that contention on objects can be dramatically reduced, allowing the performance impact of strong-serializability to be minimised. Furthermore, reasoning about isolation levels other than strong-serializability is notoriously difficult: for example, it is very hard to analyse a set of SQL transactions and determine the lowest isolation level that is safe for those transactions. Strong-serializability by contrast, is intuitive to think and reason about.

  • GoshawkDB is a distributed, redundant and sharded CP-system

    Like most modern data stores, GoshawkDB is distributed, meaning that a logical GoshawkDB data store comprises a cluster of several nodes running GoshawkDB all talking to each other. A client can connect to any of these nodes and access the entire logical data store. With GoshawkDB, a larger cluster means more capacity, greater performance, and greater resiliance to failure.

    CP-system refers to the CAP theorem (though arguably CAP is misnamed and maybe should be CAC: Consistency, Availability and Convergence). A CP-system is one which will be able to recover from failures (including network partitions) and will not violate consistency. Consistency in this case (as opposed to Consistency in ACID which is completely different), refers to the avoidance of divergence: if any two nodes are asked the same question at the same time (e.g. "what is the value of x?"), and they both give answers, then they must give the same answer. (Note however that the CAP theorem is arguably not a theorem because many of its terms are so vaguely defined. For more discussion on this point, and more detail in general about CAP, please see Mark Burgess's excellent blog post on the subject.)

    For example, suppose two clients connected to the same GoshawkDB cluster run two different transactions: one runs x = 7 whilst the other runs x = 6. After both of these transactions have committed, imagine many clients, connected to every node in the GoshawkDB cluster all retrieve the value of x. A CP-system such as GoshawkDB guarantees that all those queries will retrieve the same value of x, regardless of how many failures have occurred: they will all either see that x is 6, or they will all see that x is 7.

    The trade-off is that in the case of too many failures (node failure or network partitions), operations may block until the failures have been resolved. GoshawkDB allows you to configure the tolerance to failure independently of the size of the GoshawkDB cluster. For example, you could have a cluster of 5 nodes and require that it continues to operate unless more than 2 nodes become unreachable. Or you could have a cluster of 20 nodes and require that it continues to operate unless more than 3 nodes become unreachable. The only requirement is that if F is the number of unreachable nodes you wish to be able to tolerate, then your minimum cluster size is 2*F + 1. You may of course choose to have a cluster size larger than this. Very few, if any, data stores that I'm aware of, offer this flexibility.

    GoshawkDB maintains multiple copies of every object it stores and so is redundant. Given the same definition of F as above, there are 2*F + 1 copies of each object maintained, with each copy stored on a different node. For each object that a transaction touches, GoshawkDB ensures that a majority (F + 1) of the copies of each object vote and agree on the validity of the transaction. Furthermore, all copies can vote in parallel so the number of network delays is kept to a minimum: once a node has received a transaction from a client, just three network delays occur (and one fsync) before the outcome of the transaction is known to that node, regardless of both the F parameter and the size of the cluster.

    When the number of nodes in the cluster is greater than the minimum (2*F + 1), GoshawkDB uses an improved form of consistent hashing to ensure that each node has a balanced number of objects; there are 2*F + 1 copies of each object so if the cluster is larger than 2*F + 1 then each object will not be stored on every node: therefore GoshawkDB does automatic sharding. When a transaction is submitted, GoshawkDB computes a minimum set of nodes that need to be contacted, ensuring that a majority of copies of each object touched by the transaction vote on the transaction. All copies of each object are equal: there is no leader/follower design in GoshawkDB. Consequently, adding further nodes to a GoshawkDB cluster, but keeping the F parameter the same, reduces the likelihood that two transactions that touch different objects need to contact the same nodes, and so performance increases.

  • GoshawkDB supports retry

    I first saw retry appear when Haskell added its Software Transactional Memory library. It is incredibly powerful and creates a substantial step-change in the power of GoshawkDB. See the Howtos for examples of the sorts of things your can use retry for.

    Within a transaction, if you retry, what you're saying is: "Look at all the objects I've read from in this transaction. Put me to sleep, until any of those objects are modified. When they are modified, tell me about them and restart this transaction."

    Thus retry is a powerful and flexible form of triggers: it allows you to get GoshawkDB to push data to you rather than you having to poll for it. You could build a queue and have a client receiving and processing items from the queue. Once the queue is empty, the client can run a transaction reading from the queue, identifying that the queue is empty, and then it issues retry. The client will then block without consuming CPU until some other client runs a transaction appending to the queue.

  • GoshawkDB runs transactions in the client

    In a traditional database, the data is stored on the server and you send your queries to the server. Thus within a traditional SQL transaction, every operation you perform on the database incurs an additional round-trip to the server. The server also has to keep track of all the operations you've performed and once the transaction is complete, it validates in some way that those actions don't violate any requirements of the isolation level. This can lead to long and complex SQL queries, in order to minimise both round-trips to the server and the length of the transaction, and also leads to the use of stored procedures.

    GoshawkDB is radically different. In GoshawkDB, transactions are run only on the client. You can view the client as an partial cache of the objects stored by the server. Transactions are run to completion entirely within the client. The set of all objects read, created, and written to is then sent once to the server for validation. Thus if the client's cache of objects is valid, no matter how big the transaction is, there will only be one round-trip made to the server. In the case that the client's cache has become out of date due to some modification being made by another client, the server rejects the transaction, telling the client to rerun the transaction against an updated cache. The client will apply the updates and then automatically rerun the transaction. Objects are sent to the client on demand as the client navigates through the object graph.

    Because of this design, transactions are normal functions that use the client API to inspect and modify objects. There is no SQL-like language at all. The client takes care of re-running your transaction automatically whenever necessary. It also means that in some senses, GoshawkDB is lock-free: the actions of one transaction will never cause another transaction to be blocked and fail to make progress, and deadlocks are impossible. However, a transaction can be required to run multiple times on the client before the client is able to have the consequences of a transaction successfully validated and applied to the data store.

  • GoshawkDB does not impose global total ordering on transactions

    Despite the guarantees demanded by strong-serializability, it's not necessary to totally order all transactions. It is only necessary to identify dependencies between different transactions and to ensure those dependencies are respected. Some modern data stores enforce a global total order of all transactions, but this always has the potential to be a bottleneck. In contrast, GoshawkDB enforces ordering of transactions only as necessary, on the dependencies between transactions. GoshawkDB uses a unique variation of Vector Clocks to model dependencies between transactions, and avoids the pitfalls of standard Vector Clocks (GoshawkDB is able to ensure that Vector Clocks both grow and shrink as necessary).

    GoshawkDB uses the Paxos Synod consensus protocol to ensure that once the nodes involved in a transaction have reached agreement on the result of the transaction, that result cannot change. This avoids the flaws of 2-phase commit completely (standard 2-phase commit is proven to be equivalent to Paxos with F = 0 - i.e. it offers no ability to tolerate failure). The Paxos Synod protocol is very simple: GoshawkDB's implementation is around 100 lines of code and is much simpler than the full Paxos or Raft consensus protocols, which tend to provide global ordering features in addition to consensus features (which is all that GoshawkDB requires).

    All these techniques and approaches work together to achieve the powerful guarantees offered by GoshawkDB and to prevent as far as possible bottlenecks and unnecessary dependencies: when actions can safely proceed in parallel, they do so. The entire server is heavily multi-threaded, built on the actor paradigm with extensive use of finite state machines throughout the code base.