Data Security
For many of us, when using a data store, achieving data
security is often an after-thought. The commands and APIs
to restrict access to data are often quite distinct from
the usual set of APIs that you use to access and
manipulate data. This can be harmful as it can mean that
security concerns are left until the very end of a
project, and at that point you may come to the conclusion
that to rework a lot of code so that it can use, for
example, different accounts with differing access rights,
is going to be too time consuming and expensive to
do. It's interesting that this seems to be in contrast
with a number of programming languages where we deal
with public
/ private
field and
method restrictions as part of the syntax of the
language. Now of course, this is an apples-to-oranges
comparison: those keywords in those languages are not
controlling how different client accounts can access
different data fields and methods, but it does show that
as programmers, we are used to dealing with access
restriction concerns as we write code: it doesn't have to
be an after-thought. GoshawkDB supports two mechanisms to
restrict access to data, which are the focus of this
document.
Multiple roots and references
A GoshawkDB client can only retrieve objects which it has learnt about from reading the references of other objects it already knows about. The first set of objects that the client gets told about is the set of root objects that it is configured to have access to. A root can be shared between multiple client accounts, but equally a client account can have access to a unique root which only it is configured to use. Thus roots can be thought of as a means of name-spacing.
If an object is only reachable from a particular root then only client accounts that are configured to have access to that root will be able to reach that object. However, an object can be reachable from multiple roots: a client which has access to multiple roots could create a new object and add it as a reference from objects on several of the roots it has access to. Note that transactions are not scoped to individual roots: it is perfectly acceptable to have a single transaction that manipulates objects from multiple roots.
This alone gives a mechanism for restricting access to more sensitive data. Consider a system where you're storing customer details, including payment information such as credit card details. From a less restrictive root (perhaps set up for a GoshawkDB account that is used by the public facing web-server), you would only be able to reach objects that contain details the customer can access, such as name and address. But from a more restrictive root (perhaps set up for a GoshawkDB account that is only used internally to process orders), you could have a path to an object that contains the full credit card details, and also contains a reference to the less restricted data. In terms of modelling this in your programming language, you could have a super type which consists of just the more public fields, and then a sub-type which adds to that the private full credit card details. So you can still be using much of the same code on both your front-end and back-end systems.
Whilst it might be stating the obvious, don't forget that a reference (or pointer) is a uni-directional link between two objects and cannot be reversed. So in the above example, the object with the less restricted data in it (the super-type) has no knowledge that it is pointed to by the more restricted object (the sub-type). Pictorially, we can represent this scenario like this:
It is impossible for a GoshawkDB account that only has
access to the frontendRoot
root object to
access the object with the credit card details in it. If
you were implementing in Java, you could have
a class CustomerUserEditable
and then
a class CustomerComplete extends
CustomerUserEditable
. If you're working in Go, you
could have a type CustomerUserEditable
struct
, and then embed that within a type
CustomerComplete struct
. So thinking about which
different data fields need to be accessed by which
accounts, may reveal to you not just how to break up and
compose objects in GoshawkDB, but also then how to mirror
that structure in the language you're working in.
Object Capabilities
In some cases, you still wish to allow access to an object, but you want to restrict what actions can be performed on that object. An example of this might be an order system where customers can view their past orders: they should be able to read those orders, but you don't want the customer to be able to modify the order, especially if the order is already complete. For this type of restriction, GoshawkDB supports Object Capabilities.
In an Object Capability system, references to an object not only contain the unique Id (or address) of the object being referenced, but also carries a capability which grants the receiver of the reference authority to perform some action. These capabilities have to be enforced through some means and it must not be possible for capabilities to be forged. In GoshawkDB, when a client receives a reference to an object, that reference can grant either:
- no authority to interact with the object; or
- the authority to read the object and its references; or
- the authority to write the object and its references; or
- the authority to both read and write the object and its references.
If a client receives multiple references to the same object, those references can contain different capabilities. The client has the authority to perform any action on the object supported by any of those references. Having received one or more references to an object, a client can create a new reference to the object with the following restrictions:
- if the client has received read and write capabilities for the object, then it may create a new reference to the object with any capability;
- if the client has received only the write capability for the object, then it may create new references to the object with either the write or none capability;
- if the client has received only the read capability for the object, then it may create new references to the object with either the read or none capability;
- if the client has received only the none capability for the object, then it may create new references to the object with only the none capability.
When a client first creates a new object, it is granted full read-write capabilities on that object. It retains those capabilities for the lifetime of the connection, even if the object is never added into the object graph with read-write capabilities. If a client does not have the read capability on an object then the client is never sent the object's value (or references). So this security mechanism is not reliant on well-behaved client libraries: the server never sends object values (or references) to clients unless the client has received the read capability for an object.
It's worth clarifying what is meant by being able to read or write an object and its references. Consider the following object graph:
- A client receiving the read-write reference
to
obj1
can read the value ofobj1
, and read that it has 3 references to other objects, and can read what those references are. It can also set (write) the value ofobj1
and it can set (write) the references ofobj1
. - Because the client was able to
read
obj1
, it can read the references fromobj1
. That means the client has now received read capabilities toobj2
andobj3
, and a read-write capability toobj4
. - Note that the client cannot write
to
obj2
orobj3
because it has only received read capabilities with those references. Yes, the client can write toobj1
and setobj1
's references, but that doesn't include being able to write to the objects referenced byobj1
. - From reading
obj2
, the client learns that it has a write capability onobj5
. It cannot read the value ofobj5
and the client will never be sent the value ofobj5
, but it can set (write) the value ofobj5
and set (write)obj5
's references. - Because the client cannot read
obj5
, it cannot readobj5
's references. Therefore the client cannot discover the existence (or values) ofobj7
orobj8
. They are unreachable for this client, and the client has no idea either what the value ofobj5
is, nor how many or what its references are. - The client can read both
obj3
andobj4
and can therefore read the references fromobj3
andobj4
. From these, the client can discover that it has received both read and write capabilities onobj6
.
The GoshawkDB server keeps track of which capabilities a client has discovered and enforces these. Any attempt to perform a transaction that violates the capabilities received by the client will cause the transaction to be aborted.
Once a client has received a capability on an object, the client cannot lose that capability for the lifetime of the client's connection. In general in Object Capability systems, capabilities cannot be revoked, and GoshawkDB is no different. There are a couple of reasons why it doesn't make sense to try to be able revoke capabilities. Firstly, if a client is simply reading objects it has no observable impact on the object graph. Therefore, another client who first grants some read capability and then later wants to revoke that capability cannot reason about whether or not that read capability has been received or used by another client (in general). Secondly, once a client has received a reference to an object, it is free to store that reference from some other object it can write to (perhaps a fresh new object). As a result, if you did try to revoke a capability, that would only alter the original reference which was received; it couldn't impact the copy of the reference (and hence capability) that has been created.
As the paper Capability Myths Demolished explains, the usual way that revocation is achieved in Object Capability systems is that rather than granting access to the desired object itself, you instead grant access to some proxy actor. This has to be some actual code running somewhere that itself has the necessary capabilities on the desired object and can receive requests to carry out actions on that object. It can also receive a request to stop work, and once it has received such a request, the effect of revocation is achieved. As the queue howto demonstrates, you can create a client that behaves as such an actor, and so use GoshawkDB to message between different actors, which would give you a path by which to implement such a proxy.