Implementations, Simulations, and Abstraction Functions

Programmers often talk about implementing data abstractions, algorithms, or specifications. All three of these cases have something interesting in common. On the one hand, they each involve an abstract entity that is defined by a set of expected behaviors. On the other, there are many different possible realizations of the abstract entity in question.

In this post, we’re going to explore some methods in computer science for characterizing this relation between abstract entities and their possible implementations. We’re going to treat our abstract entities as abstract state machines. And we’re going to view implementations as simulations of the relevant abstract state machine. Finally, we’re going to explore a method for checking whether a supposed implementation is successful, using what are called abstraction functions.

State Machines

A state machine is composed of:

  • [ST1] A set SS of possible states s1,s2,{s_1, s_2, …}.
  • [ST2] A set ISI \subseteq S of initial states i1,i2,{i_1, i_2, …}.
  • [ST3] A set of actions A=a1,a2,A = {a_1, a_2, …}.
  • [ST4] A set of transitions of the form (si,aj,sk)(s_i, a_j, s_k).

Here’s a simple example, with states s1,s2,s3{s_1, s_2, s_3}, initial state i1=s1i_1 = s_1, actions a,b,c{a, b, c}, and transitions (s1,a,s2)(s_1, a, s_2), (s1,b,s3)(s_1, b, s_3), (s2,c,s3)(s_2, c, s_3), and (s3,a,s1)(s_3, a, s_1).

State machine

Our machine begins in state s1s_1 (highlighted to indicate it’s an initial state). If our machine is in state s1s_1, then action aa will transition it to state s2s_2. From there action cc will transition to s3s_3, etc. The state machine can only be in one state at a time. And notice that the same action can show up in different transitions. Here we see that from s1s_1, aa transitions the machine to s2s_2, but from s3s_3, aa transitions the machine to s1s_1.

We can view a data structure, algorithm, or specification as an abstract state machine. As an example, consider a stack:

  • [ST1] SstS_{st} is the set of all ordered nn-tuples including the empty tuple.
  • [ST2] For simplicity, we can say that II has one member: the empty tuple. This captures the idea that a stack starts out empty.
  • [ST3] A=push(),pop(),peek()A = {push(), pop(), peek()}.

To be precise, we’d have to allow for a separate push()push() action for every possible item that could be pushed onto the stack. Here’s a subgraph of our stack state machine representing a small subset of states and transitions:

State machine

peek()peek() is a transition function from a state to itself, returning the top of the stack to the client. push()push() and pop()pop() change the state of the stack, as represented by the transitions in the diagram. pop()pop() also returns the top of the stack to the client. These are exactly the behaviors we would expect of a stack.

Distributed systems provide another kind of example. One way to think of a certain type of distributed system is as a simulation of a single machine by a network of separate machines. We can use an abstract state machine to model a single machine and then investigate our distributed implementation in terms of that.

Let’s say we’re interested in building a distributed key/value store. If we were dealing with a run of the mill key/value store on a single machine, we should expect the following to be respected:

  • [KV1] At any time tt, a key kik_i is present or not.
  • [KV2] At any time tt, there is a unique value viv_i for each present key kik_i.
  • [KV3] Updates happen in a determinate order.
  • [KV4] A read of kik_i provides the latest value for kik_i.

These conditions are embedded in the following state machine:

  • [ST1] SkvS_{kv} is the set of all configurations of keys and values, where a configuration is a subset of k1,k2,{k_1, k_2, …} paired with corresponding values. For example, (k1,4),(k4,10){(k_1, 4), (k_4, 10)} or (k2,20),(k3,9),(k10,3){(k_2, 20), (k_3, 9), (k_{10}, 3)} are two possible configurations. When constructing a configuration, we begin with a subset of k1,k2,{k_1, k_2, …}, thereby assuring that each kik_i is only represented at most once.
  • [ST2] For simplicity, we can say that II has only one member: the empty configuration {}. This captures the idea that a key/value store starts out empty.
  • [ST3] A=get(),put(),delete()A = {get(), put(), delete()}. Technically, we have to allow a separate get()get() and delete()delete() action for each key kik_i and a separate put()put() action for each key/value pair (ki,vj)(k_i, v_j).

Here’s a subgraph of our key/value state machine representing a small subset of possible key/value pairs and actions:

State machine

This state machine gives us KV1 and KV2 by the definition of SkvS_{kv}. Each key kik_i is present at most once and, if present, has exactly one value.

KV3 is guaranteed by the fact that a deterministic state machine always transitions in a determinate order, so we can always point to a temporally ordered history of state transitions.

KV4 is guaranteed by KV3 combined with (a) the fact that a state machine is always in one state and (b) the definition of our state space as the set of configurations, where a configuration is composed of pairs of all present (unique) keys and their corresponding values.

A distributed key/value store with replicated data, on the other hand, does not “really” meet the conditions KV1-KV4 described above. Consider the state space for such a system. We have more to consider than a single configuration per state. Instead, we must assign a configuration to each of NN nodes in the network. So a state for a network with two nodes n1n_1 and n2n_2 might look like

n1:(k1,4),n_1: {(k_1, 4)}, n2:(k1,3),(k2,10)n_2: {(k_1, 3), (k_2, 10)}

With these more complex states in mind, let’s consider each of KV1-KV4 for our distributed system:

  • [KV1'] kik_i may be present on node n1n_1 but not n2n_2 (for example, if a replication message has not yet reached n2n_2).
  • [KV2'] While kik_i has value viv_i on n1n_1, it might have value vjv_j on n2n_2 (for example, if an update has not yet propagated to n2n_2).
  • [KV3'] Different nodes might receive updates in some order that cannot be globally determined because of a lack of a global clock.
  • [KV4'] Consider the situation in KV2'. A read of kik_i might provide the latest value if executed on n1n_1, which already has viv_i, but not if executed on n2n_2, which still has vjv_j.

The distributed key/value store technically fails to meet all four of our conditions. But what’s most important is that it appears like a single system to the client. If we can design our system in such a way that a client could never distinguish its behavior from that of a single system, then we probably have all that we need.

Traces

We can make this intuition more precise. Call anything that a client can see an external behavior of the system. In a real world single-system key/value store, the client calls get(k2)get(k_2) and receives a value in return. It’s unaware of whatever steps the system took to look up the key, retrieve the value, and (possibly) transform it into a form the client recognizes. Furthermore, it’s unaware of any transitory states the system might have entered to carry out these steps. These are all internal behaviors of the system.

From the client’s point of view, only the external behavior matters. It is only on the basis of external behavior that it can distinguish one state of the system from another. So when we say we’d like to design a distributed system in such a way that a client can never distinguish its behavior from that of a single system, we should focus on the external behavior.

The history of external behaviors of a system is called a trace. The set of all possible traces defines that system from the client’s point of view. Let’s distinguish a trace of a system from an execution. By execution, we mean to include internal as well as external behaviors. The crucial point to keep in mind is that a single trace can be realized by many different executions.

Consider a contrived example: we have two implementations of a stack. They both use a linked list (cleverly called listlist) to represent their data. The first one implements the stack API in a familiar way:

Stack1Stack_1 push(a):=list=a::list;push(a) := { list = a::list; } $peek() := { return$ $list.head; }$ $pop() := { list = list.tail; return$ $list.head; }$

Our second implementation relies on a more convoluted scheme. It keeps listlist in reverse order for some inexplicable reason. It then implements the stack API as follows:

Stack2Stack_2 push(a):=list=(a::(list.reverse)).reverse;push(a) := { list = (a::(list.reverse)).reverse; } $peek() := { return$ $(list.reverse).head; }$ $pop() := { h = (list.reverse).head; list = ((list.reverse).tail).reverse; return$ $h; }$

Despite the weird internal steps Stack2Stack_2 takes involving reverse, its traces are indistinguishable from those of Stack1Stack_1. As far as the client is concerned, these two versions of a stack are the same (excluding performance considerations). For any series of API calls by the client, we would have two different executions but the same trace. Furthermore, from the client’s perspective, we can describe these traces by the same state machine diagram. Intuitively, they both implement the same abstract state machine.

Let’s make this intuition more precise by considering what it means for one system to simulatesimulate another.

Simulations

Say we have some system XX that is designed to simulate a target system TT. What must be true in order for XX to count as a successful simulation?

For one thing, XX must never do anything that TT couldn’t do. A simulation should be limited to the behaviors of its target. We can express this idea in terms of traces.

We’ll use XtraceX^{trace} to denote the set of all possible traces of XX. From the client’s point of view, XtraceX^{trace} defines XX, since it encompasses all the possible sequences of external behaviors of XX.

If XX successfully simulates TT, then it should be limited to possible traces of TT. Formally, we can express this condition as

  • [SIM1] XtraceTtraceX^{trace} \subseteq T^{trace}.

SIM1 says that if XX simulates TT, then any trace of XX is a trace of TT.

We can also express our intuition in terms of safety and liveness, two concepts taken from the distributed systems literature. A safety property of a set of traces is a property that holds for any step in any trace in the set. Safety properties say that something bad will never happen. KV1-KV4 above are examples of safety properties for a single system key/value store.

Take KV2, the requirement that any given key has at most one value at any one time. This property ensures that at any step in any trace of our key/value store, we are guaranteed to have at most one value for any given key.

A liveness property for a set of traces guarantees that any trace in the set contains a step in which that property holds. Liveness properties say that something good will happen. For example, if we say a system must eventually terminate, then we are talking about a liveness property of the system. For XtraceX^{trace}, this would mean that every trace in XtraceX^{trace} contains a termination step.

We can use these concepts to further refine our model of simulations. Again, consider a simulation XX of some target TT. If a safety property holds for TT, then it must also hold for XX. Think of it this way. If a client is guaranteed that TT will never do something bad, then a successful simulation XX must respect that guarantee.

On the other hand, if a liveness property holds for TT, it is not necessarily true that it holds for XX. Why is that? It’s because traces can be infinite in length. Imagine that a client is guaranteed that TT will eventually do something good. If XX lacks this liveness property but every trace of XX is infinite in length, the client will never be able to determine that XX lacks the property. This is because the client could always wonder whether any given infinite trace of XX might exhibit the property at some later time.

However, if a liveness property holds for XX, then it must hold for TT. This is because we can be certain that every trace of XX will eventually exhibit the guaranteed property. However, if a trace of XX contains some step xix_i for which the property holds, then so must some trace of TT. This follows from SIM1 above.

Using the notation s(X)s(X) and l(X)l(X) for ascribing the safety and liveness properties to XX, we can now add the following conditions for successful simulation:

  • [SIM2] If s(T)s(T), then s(X)s(X)
  • [SIM3] If l(X)l(X), then l(T)l(T)

You may have noticed that SIM1 entails SIM2 and SIM3. But it can be useful to separate these concepts out to get a clearer sense of how XtraceX^{trace} and TtraceT^{trace} are related.

We have left out one important, and perhaps obvious, point. If a client is the one issuing commands to XX, then it must be the case that XX is able to process the same set of commands as TT. This ensures that XtraceX^{trace} cannot be arbitrarily small. Though it can be a proper subset of TtraceT^{trace}, it must still capture all the traces the client would expect to encounter through any series of issued commands.

We have now made precise our earlier intuition that a successful simulation should be indistinguishable from its target to a client. If XX successfully simulates TT, then a client will never be able to tell that XX and TT are distinct.

In thinking about simulations, we have focused entirely on external behaviors, the kind of thing a client can see. But different successful implementations can exhibit divergent internal behaviors, as we saw with our stack implementations above. So we still have to account for the relationship between the idiosyncratic states and transitions of a given implementation, on the one hand, and the states and transitions of the abstract state machine it is implementing, on the other. For that reason, we now turn to abstraction functions.

Abstraction Functions

We have seen that many internal executions can correspond to one external trace. This is directly related to our central question: what is the relationship between an abstract state machine and its implementations? When we move from the idiosyncratic states and transitions of an implementation to the shared target states and transitions of the corresponding abstract state machine, we are moving up the “abstraction ladder”. For example, we say that Stack1Stack_1 and Stack2Stack_2 are both examples of a stack. Here the concept of a stack is more abstract than Stack1Stack_1 or Stack2Stack_2. These are familiar ideas, but how do we make them precise?

What we need is a way to map states of an implementation to the corresponding states of the target abstract state machine. With such a method, we can describe the states of the implementation in the more abstract language of the specification. To anticipate, we will find that there is a many-to-one relation between the states of the implementation and the states of the target abstract state machine. The implementation encapsulates details (states and transitions) that can be safely ignored by a client. The client need only think in terms of the abstract state machine.

What we need is an abstraction function, which we will call ff. ff is a total function from the states of our implementation XX into the states of our target TT. This means that ff maps each state of XX to a state of TT.

Returning to our stack example, Stack2Stack_2 moves through a number of states related to reversing its internal listlist whenever it is updated. However, none of these internal states are accessible to the client. What this means is that once Stack2Stack_2 has received a command, say pop()pop(), it seems to the client as though it immediately transitions to the next corresponding state in the abstract state machine. This is because all the client can do to interact with it is to issue another command. But Stack2Stack_2 will only execute this next command once it has finished processing pop()pop(). So we can safely map all the intermediate internal steps of Stack2Stack_2 to a single state of the abstract state machine for stack.

Given an abstraction function f:XTf: X \to T, how do we prove that XX successfully implements TT? We could do this through induction. As with any inductive argument, we would need to establish our base and our inductive step. Taking tr(s,s)tr(s, s') to denote the existence of a transition from ss to ss', we need to show the following:

  • [Base] If sis_i is an initial state of XX, then f(si)f(s_i) is an initial state of TT.
  • [Inductive Step] If tr(s,s)tr(s, s'), then tr(f(s),f(s))tr(f(s), f(s')), assuming the identity transition exists for every state in TT.

Base says that every initial state of XX corresponds to an initial state of TT. Inductive Step says that any transition internal to XX corresponds to a transition in TT. We must assume the existence of the identity transition (that is, the transition from a state to itself) in TT because transitions internal to XX might all be subsumed under one state in TT, as we were just discussing in relation to Stack2Stack_2.

If we can establish both Base and Inductive Step, then by induction we’ve shown that every execution of XX corresponds to a trace of TT. And this also means we’ve shown that XX simulates TT. That’s because we’ve shown that the set of traces of XX is a subset of the set of traces of TT, thereby satisfying SIM1 (and hence SIM2 and SIM3). So the induction above provides a template for constructing a proof that some XX implements some target TT.

As a final example, recall our distributed key/value store from earlier. We noticed that, strictly speaking, it failed to meet any of conditions KV1-KV4. That was because of the possibility of inconsistencies between nodes at any particular point in time. However, if we can find an abstraction function that maps states of the distributed system to states of a single system key/value store, then we can show that it actually does satisfy KV1-KV4 from the client’s point of view. Much of the work on distributed systems amounts to designing systems for which such an abstraction function could be defined.

Conclusion

We’ve explored some methods in computer science for characterizing the implements relation. We’ve viewed implementation targets (whether data abstractions, specifications, or single systems) as abstract state machines. And we’ve interpreted implementations as simulations of those abstract state machines. We’ve looked at some criteria for successful simulations and considered how abstraction functions can help us think about the relationship between the internal states of an implementation and the states of the abstract state machine it’s simulating.

These conceptual tools may not be of any particular use in designing implementations, but they can help us think about what it means to be a successful implementation, and how we might go about proving that a particular implementation is successful.

References

[1] B. Alpern and F. Schneider. Defining liveness. Information Processing Letters 21,4, 1985.

[2] L. Lamport. A simple approach to specifying concurrent systems. Comm. ACM, 32,1, Jan. 1989.

[3] B. W. Lampson. How to build a highly available system using consensus. Lecture Notes in Computer Science, Vol. 1151, 1996.

[4] R. Milner. Communicating and Mobile Systems: the Pi-Calculus. Cambridge University Press, 1999.