GoshawkDB: Data Security

A distributed, transactional,
fault-tolerant object store

Data Security

For many of us, when using a data store, achieving data security is often an after-thought. The commands and APIs to restrict access to data are often quite distinct from the usual set of APIs that you use to access and manipulate data. This can be harmful as it can mean that security concerns are left until the very end of a project, and at that point you may come to the conclusion that to rework a lot of code so that it can use, for example, different accounts with differing access rights, is going to be too time consuming and expensive to do. It's interesting that this seems to be in contrast with a number of programming languages where we deal with public / private field and method restrictions as part of the syntax of the language. Now of course, this is an apples-to-oranges comparison: those keywords in those languages are not controlling how different client accounts can access different data fields and methods, but it does show that as programmers, we are used to dealing with access restriction concerns as we write code: it doesn't have to be an after-thought. GoshawkDB supports two mechanisms to restrict access to data, which are the focus of this document.

Multiple roots and references

A GoshawkDB client can only retrieve objects which it has learnt about from reading the references of other objects it already knows about. The first set of objects that the client gets told about is the set of root objects that it is configured to have access to. A root can be shared between multiple client accounts, but equally a client account can have access to a unique root which only it is configured to use. Thus roots can be thought of as a means of name-spacing.

If an object is only reachable from a particular root then only client accounts that are configured to have access to that root will be able to reach that object. However, an object can be reachable from multiple roots: a client which has access to multiple roots could create a new object and add it as a reference from objects on several of the roots it has access to. Note that transactions are not scoped to individual roots: it is perfectly acceptable to have a single transaction that manipulates objects from multiple roots.

This alone gives a mechanism for restricting access to more sensitive data. Consider a system where you're storing customer details, including payment information such as credit card details. From a less restrictive root (perhaps set up for a GoshawkDB account that is used by the public facing web-server), you would only be able to reach objects that contain details the customer can access, such as name and address. But from a more restrictive root (perhaps set up for a GoshawkDB account that is only used internally to process orders), you could have a path to an object that contains the full credit card details, and also contains a reference to the less restricted data. In terms of modelling this in your programming language, you could have a super type which consists of just the more public fields, and then a sub-type which adds to that the private full credit card details. So you can still be using much of the same code on both your front-end and back-end systems.

Whilst it might be stating the obvious, don't forget that a reference (or pointer) is a uni-directional link between two objects and cannot be reversed. So in the above example, the object with the less restricted data in it (the super-type) has no knowledge that it is pointed to by the more restricted object (the sub-type). Pictorially, we can represent this scenario like this:

It is impossible for a GoshawkDB account that only has access to the frontendRoot root object to access the object with the credit card details in it. If you were implementing in Java, you could have a class CustomerUserEditable and then a class CustomerComplete extends CustomerUserEditable. If you're working in Go, you could have a type CustomerUserEditable struct, and then embed that within a type CustomerComplete struct. So thinking about which different data fields need to be accessed by which accounts, may reveal to you not just how to break up and compose objects in GoshawkDB, but also then how to mirror that structure in the language you're working in.

Object Capabilities

In some cases, you still wish to allow access to an object, but you want to restrict what actions can be performed on that object. An example of this might be an order system where customers can view their past orders: they should be able to read those orders, but you don't want the customer to be able to modify the order, especially if the order is already complete. For this type of restriction, GoshawkDB supports Object Capabilities.

In an Object Capability system, references to an object not only contain the unique Id (or address) of the object being referenced, but also carries a capability which grants the receiver of the reference authority to perform some action. These capabilities have to be enforced through some means and it must not be possible for capabilities to be forged. In GoshawkDB, when a client receives a reference to an object, that reference can grant either:

no authority to interact with the object; or
the authority to read the object and its references; or
the authority to write the object and its references; or
the authority to both read and write the object and its references.

If a client receives multiple references to the same object, those references can contain different capabilities. The client has the authority to perform any action on the object supported by any of those references. Having received one or more references to an object, a client can create a new reference to the object with the following restrictions:

if the client has received read and write capabilities for the object, then it may create a new reference to the object with any capability;
if the client has received only the write capability for the object, then it may create new references to the object with either the write or none capability;
if the client has received only the read capability for the object, then it may create new references to the object with either the read or none capability;
if the client has received only the none capability for the object, then it may create new references to the object with only the none capability.

I.e. a client may not create references to an object with capabilities that it itself has not received. This is enforced by the GoshawkDB server.

When a client first creates a new object, it is granted full read-write capabilities on that object. It retains those capabilities for the lifetime of the connection, even if the object is never added into the object graph with read-write capabilities. If a client does not have the read capability on an object then the client is never sent the object's value (or references). So this security mechanism is not reliant on well-behaved client libraries: the server never sends object values (or references) to clients unless the client has received the read capability for an object.

It's worth clarifying what is meant by being able to read or write an object and its references. Consider the following object graph:

A client receiving the read-write reference to obj1 can read the value of obj1, and read that it has 3 references to other objects, and can read what those references are. It can also set (write) the value of obj1 and it can set (write) the references of obj1.
Because the client was able to read obj1, it can read the references from obj1. That means the client has now received read capabilities to obj2 and obj3, and a read-write capability to obj4.
Note that the client cannot write to obj2 or obj3 because it has only received read capabilities with those references. Yes, the client can write to obj1 and set obj1's references, but that doesn't include being able to write to the objects referenced by obj1.
From reading obj2, the client learns that it has a write capability on obj5. It cannot read the value of obj5 and the client will never be sent the value of obj5, but it can set (write) the value of obj5 and set (write) obj5's references.
Because the client cannot read obj5, it cannot read obj5's references. Therefore the client cannot discover the existence (or values) of obj7 or obj8. They are unreachable for this client, and the client has no idea either what the value of obj5 is, nor how many or what its references are.
The client can read both obj3 and obj4 and can therefore read the references from obj3 and obj4. From these, the client can discover that it has received both read and write capabilities on obj6.

The GoshawkDB server keeps track of which capabilities a client has discovered and enforces these. Any attempt to perform a transaction that violates the capabilities received by the client will cause the transaction to be aborted.

Once a client has received a capability on an object, the client cannot lose that capability for the lifetime of the client's connection. In general in Object Capability systems, capabilities cannot be revoked, and GoshawkDB is no different. There are a couple of reasons why it doesn't make sense to try to be able revoke capabilities. Firstly, if a client is simply reading objects it has no observable impact on the object graph. Therefore, another client who first grants some read capability and then later wants to revoke that capability cannot reason about whether or not that read capability has been received or used by another client (in general). Secondly, once a client has received a reference to an object, it is free to store that reference from some other object it can write to (perhaps a fresh new object). As a result, if you did try to revoke a capability, that would only alter the original reference which was received; it couldn't impact the copy of the reference (and hence capability) that has been created.

As the paper Capability Myths Demolished explains, the usual way that revocation is achieved in Object Capability systems is that rather than granting access to the desired object itself, you instead grant access to some proxy actor. This has to be some actual code running somewhere that itself has the necessary capabilities on the desired object and can receive requests to carry out actions on that object. It can also receive a request to stop work, and once it has received such a request, the effect of revocation is achieved. As the queue howto demonstrates, you can create a client that behaves as such an actor, and so use GoshawkDB to message between different actors, which would give you a path by which to implement such a proxy.