Distributed object systems: translate what data?

Project Cambria: Translate your data with lenses proposes using edit lenses to translate data to handle different protocols, but we believe lenses could not scale to large, "fragmented" distributed protocols.

Lenses do not appear composable in this situation; we could have protocols \(B\) and \(C\) which are derived from \(A\), and would have to use a \(B \rightarrow C\) lens, as opposed to composing \(B \rightarrow A\) and \(A \rightarrow C\) if \(B\) and \(C\) both introduce something \(A\) doesn't have. \(B \rightarrow A\) would drop that new feature, then \(A \rightarrow C\) would replace it with a default value, which is clearly not equivalent to a \(B \rightarrow C\) lens.

Sorry, your browser does not support SVG.

One would have to write \(\mathcal{O}(n^2)\) lenses to translate between \(n\) lenses, which would quicky become too time-consuming with a sufficiently branched distributed protocol. In this way, a lens-based approach reeks of Creationism rather than the "evolution" it suggests it would promote.

We want to try a very different approach. A rewriting of the subtitle of that article that is in line with our strategy gives us this subtitle:

Translate what data?

We are going to offer our solution in the context of a distributed object system, which communicates with objects that have associated behaviour, as well as the state mere data structures contain.1 With this object system, we won't be translating any data. Instead, we will be trying to create objects that implement the union of all the protocols they find themselves in.

By establishing a protocol consisting of valid messages to send to an object partaking in a protocol, we can handle some definition of data migrations trivially, should that protocol only ever get larger. A client that expects an object that responds to some messages would not care if that object can respond to more messages.

But what if the object responds to fewer messages than the client expects? One could get out of this situation easily (but not politely or in a correct manner) by providing default values when reading data, and drop written data, to provide an interface that mostly works. Ideally though, we'd convince the object to start working with our new protocol, but if our old protocol also has some features we want, we'd prefer to have both. Unifying two implementations (the one we were given in that object, and one that we may have that implements the protocol we want) may be very possible, and we will describe some techniques to unify implementations.

To simplify the presentation of this idea, we will make a few assumptions:

  • Different protocols don't implement the same stuff in different ways. The inference we will present is already pretty messy, and deciding on whether two different protocols should be the same is very hard.
  • However, there may be different implementations of one protocol. We will assume that later implementations of a protocol are extensions of an earlier implementation, so that attempting to merge implementations isn't completely hopeless.
  • There are some metrics we can use to quantify the value of implementations, possibly relating to execution space and time.

Taking an analogy somewhere

The first solution we can attempt uses multiple inheritance. We change a protocol by dynamically adding classes called mixins to its instances, allowing us to extend behaviour. When we see an object that's missing the extensions we want, we'll ask it to add a mixin to itself.

For example, in Common Lisp2, we may write:

;;; A protocol.
(defgeneric name (human))
(defgeneric date-of-birth (human))
(defclass human ()
  ((name          ... :reader name)
   (date-of-birth ... :reader date-of-birth))
  (:documentation "The protocol class for a human."))

;;; An extension to this protocol.
(defgeneric pets (owner)
  (:documentation "Returns a list of the pets an owner owns."))
(defgeneric add-pet (owner pet)
  (:documentation "Add a pet to the pets of an owner."))
(defgeneric remove-pet (owner pet)
  (:documentation "Remove a pet from the pets of an owner."))
(defclass pet-owner-mixin ()
  ;; We provide an initial form for this slot, which ADD-MIXIN will
  ;; eventually use as the initial value for a newly mixed instrance.
  ;; We initialize with an empty list, as we don't know of any pets.
  ;; The accessor %PETS will allow us to modify the list of pets, but it 
  ;; won't be accessible by the outside world.
  ((pets :initform '() :reader pets :accessor %pets))
  (:documentation "A mixin that gives a human pets."))

(defmethod add-pet ((human pet-owner-mixin) pet)
  (push pet (%pets human)))
(defmethod remove-pet ((human pet-owner-mixin) pet)
  (setf (%pets human) (remove pet (%pets human))))

;;; A method for a human that does not have the PET-OWNER-MIXIN 
;;; that adds it.
(defmethod pets ((human human))
  (add-mixin human 'pet-owner-mixin)
  ;; Now that we have a list of pets, try again.
  (pets human))
;; Ditto for adding to and removing from the list of pets.
(defmethod add-pet ((human human) pet)
  (add-mixin human 'pet-owner-mixin)
  (add-pet human pet))
(defmethod remove-pet ((human human) pet)
  (add-mixin human 'pet-owner-mixin)
  (remove-pet human pet))  

If we receive an "old" human, then accessing the pets of the human will cause us to add the pets-mixin to the human, and we will be given the default value. We can also add pets to the human, and as adding mixins preserves the identity of objects, we have added features to the human transparently.

Supposing we can transport methods across the network, we could also use this technique to transport new features to clients. A client could query the object for what it should present to the user, and our base protocol and extension could implement such a query like:

;;; This protocol function would be present somewhere in the client.
(defgeneric present (object)
  (:documentation "Produce a presentation object that represents the
                   object to be presented."))

(defmethod present ((human human))
  "A human is presented as a text box containing their name, and their
   date of birth."
  (vertically (text-box "Name" human 'name)
              (text-box "Date of birth" human 'date-of-birth)))

;;; Our pet mixin implements:
(defmethod present ((human pet-owner-mixin))
  "A pet owner also presents a list of pets."
  (vertically (call-next-method) ; get everything else to present
              (item-box "Pets" human #'pets #'add-pet #'remove-pet)))

We assume the presence of some functions to produce a "presentation" that the user will see and interact with:

  • A vertically function that stacks its arguments from top to bottom,
  • A text-box function that takes a label, object, and accessor name3 and creates a labelled text box, and
  • An item-box that presents a list of objects, like a text-box does with text. This would recursively call present on each object, and also permit adding and removal of objects somehow.

Furthermore, we have to call methods for mixins first, for call-next-method based composition to make sense. In real CLOS, we could use method combination to ensure that the value of calling every method is combined into one vertically presentation without having to explicitly call every method, but method combination may or may not make sense with message passing OOP. You decide if it would be simpler or not.

In either case, we could track mixins that we have seen, and allow the user to decide which mixins they would like to use when interacting with objects that belong to a given protocol, and thus update the client for them. This is all well and good when we are extending a protocol with concepts that are disjoint, but when we want to modify the base implementation and/or protocol, as Cambria is targeted towards, we need to do some more work.

Taking an analogy to logical places

With our example of pets, we may want to extend the pet protocol in some other ways that cause us to consider changing the implementation. Suppose we want to find a pet by their name, and we have profiled and \(\mathcal{O}(n)\) list search is a bottleneck for our program; and we now want to use a data structure like a map to associate names with pets and provide sub-linear complexity for searching.

Earlier, we wrote down the definition of a generic function that is used, and an implementation using a list:

(defgeneric find-pet (owner name)
  (:documentation "Find a pet that an owner owns by name, or return NIL."))

(defmethod find-pet ((owner pet-owner-mixin) name)
  ;; We use the Common Lisp function FIND to find a pet by name.
  ;; The name of a pet is retrieved using the function PET-NAME, and we can
  ;; compare names with STRING= (string equals). FIND will return NIL if there
  ;; isn't a pet in the list of pets with the given name. 
  (find name (pets owner) :key #'pet-name :test #'string=))

We just need to modify our implementation of the pet protocol to use a map instead of a list, yielding this implementation:

(defclass pet-owner-map-mixin ()
  ;; Common Lisp does not provide a "persistent" map implementation, but
  ;; we will assume we have already implemented one. This code could be 
  ;; ported to use the fset library fairly easily.
  ;; Note that we no longer can just implement the reader PETS as a reader
  ;; of this slot; it expects a list of pets, but we have a map of pets.
  ((pets :initform (empty-map) :accessor %pets))
  (:documentation "A mixin that gives a human pets."))

(defmethod add-pet ((human pet-owner-map-mixin) pet)
  (setf (%pets human) (add-to-map (%pets human) (pet-name pet) pet)))
(defmethod remove-pet ((human pet-owner-map-mixin) pet)
  (setf (%pets human) (remove-from-map (%pets human) (pet-name pet))))
(defmethod find-pet ((human pet-owner-map-mixin) name)
  (get-map (%pets human) name))
;; As mentioned before, we will implement PETS by producing a list of all the 
;; values (pets) in the pet map.
(defmethod pets ((human pet-owner-map-mixin) pet)
  (map-values (%pets human)))

This new implementation is perfectly compatible with users of the old; they implement the same protocol, and so they are close to indistinguishable. We now have two ways to implement the pet protocol, and three ways to implement the human protocol:

Sorry, your browser does not support SVG.

We may now want to convince objects that use the old implementation to use the new implementation. To do this, we may produce a translator to translate from our old implemenation to the new implementation (a bit like the Cambria approach), and translate instances when we see them; but if you are reimplementing someone else's protocol, it may not be possible for them to produce a translator and publish it.

If we have tracked the actions (transactions) taken on an object, and replay that list of actions (a transaction log) on a new instance of the new implementation, we end up with an object that is equivalent to the old, except that it is using the new implementation. But how can we decide on which implementation would be optimal in a decentralised manner?

Taking an analogy way too far

For this, we're going to need a powerful tool – the best tool: science.4 We can test either implementation, and decide which is optimal based on some rules, such as ensuring they implement the protocol correctly, as well as some measurements, such as the execution time on some test cases for this protocol.

A programmer has hopefully written some test cases that implementations should follow, such as:

;; OWNER will be bound to a function that produces pet owners.
;; PET will be bound to a function that produces pets.
(define-tests pets-tests (owner pet)
  (assert-equal '() (pets (owner))
                "An owner does not have any pets by default.")
  (let ((an-owner (owner)))
    (add-pet an-owner (pet :name "Demeter"))
    (let ((the-pet (find-pet an-owner "Demeter")))
      (assert-false (null the-pet) 
                    "An owner should have a pet named Demeter now.")
      (assert-equal "Demeter" (pet-name the-pet)
                    "The pet should be named Demeter.")
      (remove-pet an-owner the-pet))
    (assert-true (null (find-pet the-pet "Demeter"))
                 "An owner should not have a pet now.")))

If any of these test fail for some implementation, we should not migrate to using it, as it is clearly broken. This test suite can then be used to filter out broken implementations, but we may still want to pick a faster implementation.

We could then use some benchmark programs to decide on which implementation will be faster, but in some cases, an implementation with a larger complexity may still be faster than an implementation with a smaller complexity, if it has faster "steps". For example, a hash table has (amortized) \(\mathcal{O}(1)\) lookup, but hashing the key may cause lookup to be slower than with a list for sufficiently small \(n\). Instead of benchmarking objects with given complexities and sizes, we can try every protocol method (or some subset thereof if the protocol is large) in ways that exercise most execution paths on a given object, to decide which implementation would be most appropriate at this given moment with this object. The programmer who wrote the pet protocol may have written these benchmarks:

;; THE-OWNER will be bound to the owner object that is currently being 
;; benchmarked.
(define-benchmarks pets-benchmark (the-owner)
  ;; Note that effects done on objects are reset between BENCHMARK forms, so
  ;; the object holds the same state before each benchmark.
  (let ((name (random-name)))
    (benchmark (add-pet the-owner (pet :name name))
               "Add a new pet."))
  (let* ((pet (random-element-of (pets the-owner)))
         (name (pet-name pet)))
    (benchmark (remove-pet the-owner pet)
               "Remove a pet.")
    (benchmark (find-pet the-owner name)
               "Find a pet the owner does own."))
  (let ((name (random-name)))
    (benchmark (find-pet the-owner name)
               "Find a pet the owner does not own.")))

A client can then run these benchmarks several times (to consider, say, pets at the start and end of the lists in the first implementation), and then take the geometric mean of the instructions executed per benchmark to quantitatively measure the performance of an implementation. Then the client will pick the implementation (or combination of implementations, when there are multiple extension protocols to consider) which requires the fewest instructions to be executed.5 Supposing lookup for a map is slower than a list for sufficiently few elements, then the client will choose the list-based implementation with fewer pets, and the map-based implementation with more pets.

Conclusion

Using protocols and composable mixins instead of exposing data structures provides a means of "mutating" objects to augment their behaviour and modify their implementation, and with protocol test suites and benchmarks, we can select the best implementations to use for a given object. The combination allows protocols to be developed and implemented independently, and for clients to independently select the best implementations and grow their knowledge of protocols. As such, there are many more parallels with evolution than designs that require one to design migration strategies.

I wrote this article to celebrate the second birthday of Netfarm, a distributed-replicated object system that we may implement such an evolutionary system on. We are in the process of writing a message-passing system to have objects communicate on, and the immutable nature of stored objects has already nudged us to think in transaction logs, so we may not be too far off being able to implement this idea. We have some other ideas, such as a collaborative filtering system to replace centralised moderation; if you'd like to discuss these and your own ideas, feel free to join #netfarm on Freenode or #netfarm:matrix.org on Matrix.

Footnotes:

1

This is roughly the kind of object-oriented programming Smalltalk, Self and Erlang implement to some extent, with "messaging" achieving polymorphism; which is unlike the style present in C++ and Java, which have early-bound methods, and unlike the style in Common Lisp and Julia, which focus on generic functions.

2

The reader will have to imagine a CLOS which uses message-passing, and could serialize methods for other clients to run. We are in a tight situation with the current choice of message-passing languages: Smalltalk does not feature multiple inheritance, and Self and Erlang do not use classes. Well, we could use multiple inheritance of prototypes in Self, but we aren't as familiar with prototypes.

3

The text-box can check for a function named (setf <accessor name>), and allow modification using that function if it exists.

4

An embarrasing reference to Automatic 3Dification of Nintendo games: The glEnd() of Zelda, which somewhat inspired this approach.

5

This would not consider the time taken to migrate implementation, but if the implementation isn't changed ever again, the overhead approaches zero over time.