Processing what you don't know, or Cā‚ā‚„ dating schema-based design

In our previous article, we wrote why we believe schema "evolution" to be a dead end for decentralised system development. In this article, we will follow up by demonstrating why protocol-based reasoning and inference is superior to data and schema translation. This is, in some way, a rebuttal to the premise of No, dynamic type systems are not inherently more open1, that one must know that they can operate on some data to do so; and to API design is stuck in the past2, as we describe how to work around interface changes without having to organise an office meetup and tell our clients to update.

As an example, we will try to translate methods for one collection protocol to another. One protocol could have been updated to the other, or they could have been developed simultaneously by two developers who don't know each others' protocols.

Here are two representations of the protocols; a state machine describing the behaviour of each collection with regards to its keys, and the type signatures of their methods:

Sorry, your browser does not support SVG.

Note that add, remove, insert and and delete operate using side effects; they modify each collection in place.3

\begin{align*} \text{add} &: \text{collection1} \times k \times v \rightarrow () \\ \text{remove} &: \text{collection1} \times k \rightarrow () \\ \text{find} &: \text{collection1} \times k \rightarrow \text{Maybe}\ v \\ \hline \text{insert} &: \text{collection2} \times k \times v \rightarrow () \\ \text{delete} &: \text{collection2} \times k \rightarrow () \\ \text{member} &: \text{collection2} \times k \rightarrow \text{boolean} \end{align*}

These protocols look quite similar, and the reader may already have an idea of how to translate some of the methods. It is possible to construct an "acts-like" relationship from member to find. Suppose \(c_1\) and \(c_2\) are the "same" collection. Then we can rewrite calls to member with calls to find using the rule:

\begin{align*} \text{member}\ c_2\ k &\Leftrightarrow \text{is-present?}\ (\text{find}\ c_1\ k) \end{align*}

We cannot, however, reconstruct the value of find from member. Thus, "acts-like" is not symmetrical; find gives us "more" information than member, so we can emulate the latter with the former, but not the former with the latter.

Mechanical reasoning about protocols

Our first concern will be how to represent this information for consumption by a computer program. We may describe how to tell if an object is in a given state, and how to transition between the states.

(define-protocol collection1 ((the-collection collection))
  ;; Each key has its own state.
  (for-each (key)
    ;; We need to include the initial state to "orient" each graph. Without
    ;; knowing the initial state, we could decide that MEMBER acts like the 
    ;; opposite of FIND (deciding that MEMBER would return false if FIND
    ;; would find a value), and it would be equally as plausible as the 
    ;; correct mapping.
    (states (:start cannot-find)
      (cannot-find :when (is-empty?   (find key the-collection)))
      (can-find    :when (is-present? (find key the-collection))))
    (transitions
      ((cannot-find :to can-find)
       (for-some (value) (add key value the-collection)))
      ((can-find :to cannot-find)
       (remove key the-collection)))
    ;; We will also include a contract, following the example
    ;; with DEFINE-TESTS from "Translate what data?"
    ;; This could be encoded into states somehow, but it doesn't read
    ;; as clearly in our opinion.
    (tests
      (for-some (value)
        (after (insert key value the-collection)
          (let ((maybe-value (find key the-collection)))
            (assert-true  (is-present? maybe-value))
            (assert-equal value (value-of maybe-value))))))))

(define-protocol collection2 ((the-collection collection))
  (for-each (key)
     (states (:start is-not-member)
       (is-not-member :when (not (member key the-collection)))
       (is-member     :when      (member key the-collection)))
     (transitions
       ((is-not-member :to is-member)
        (for-some (value) (insert key value the-collection)))
       ((is-member :to is-not-member)
        (delete key the-collection)))))

With such definitions in place, we can begin to reason about these protocols. Suppose, again, we are tasked to call member, but the receiver does not understand it, and only understands the collection1 protocol.

Our first step is to decide what member is going to do. We see that it tells if we are in the is-member state, so we can replace it with the test for an equivalent state in the collection1 protocol.

We must find an isomorphism for the (sub)graph of states and transitions for the part of the state machine that is-member is contained in4. We find that there is an isomorphism, which maps is-member to can-find, so we substitute our test for is-member for the test can-find. This requires a bit of interpretation, or for the protocol object to contain a procedure that will make the appropriate calls, but in either case, we can substitute like for like5.

Our technique is not much more difficult if we need to find a method that encodes a transition; compute the isomorphism again, find the transformed states, then find the transition (which must exist by means of having an isomorphism), then call that with appropriate arguments.

The drive to 2021

Combined with our idea of "protocol evolution" tracking and implementation evaluation, we believe that finding isomorphisms in protocols can facilitate forwards and backwards compatibility and interoperability systems with relatively little programmer intervention. Much like the tests and benchmarks used in Translate what data?, the algebras and state diagrams used to reason about protocols usually exist in some form, as a byproduct of the reasoning processes of a programmer.

These techniques may be necessary for developing truly distributed systems with no hierarchical, or even "objective", decision-making strategies; if done right, by mixing in and inferring protocols and intentions, we can branch out protocols in many ways and keep them working. They also hold implementations accountable, in that we can swap out and migrate to another implementation if one has a bug, accidental or otherwise.

It may soon be possible to implement some of these techniques in Netfarm, a distributed-replicated object system that allows for users to describe many protocols with their classes and behaviour explicated. The drive to 2021 refers to a break in my schedule that occurs at the end of a school year; there was a drive to 2020 last year spanning most of December. University years end a month earlier, so I now have two months to hack on these ideas.

Whether it is implemented in Netfarm or otherwise, I am hopeful that there will be a meta-protocol that will allow a hundred protocols to bloom, and for one to reap all of them with some clever inference and mixing strategies; in broad opposition to the mechanical and over-specialised design strategies of today that focus around the schema, the data structure, and the external representation, with no hope for handling differences in semantics and implementation techniques, which force a user who wishes to use a dozen services to have a dozen identities, a dozen clients, and a host of those dozen services to implement a dozen upgrade strategies and databases.

Footnotes:

1

A much simpler counter-argument is how we handle the situation where we can't do what we want with an object. The lazy argument is that a program that does something it doesn't know it can do cannot exist in a statically typed language; but it sort of can, with default methods that tell the client to get lost, by means of exceptions or error values. We can, however, encode alternate strategies at many levels with a nice condition system with restarts, such as the condition systems in Common Lisp and Dylan; though there are exception systems in many statically typed languages, including Standard ML, they do not feature restarts, and many programmers favour value-based error propagation (such as Maybe and Either types). In some "modern" languages such as Rust and Go, there is only value-based error propagation.

2

I am somewhat offended that this article states "features like auto-complete and jump-to-definition … are mostly only possible in statically typed languages" when I use both frequently with Common Lisp code, and I have witnessed them both while experimenting with Smalltalk-80 images. Yes, that is Smalltalk from 1980. With jump-to-definition and auto-complete (well, it's more like "auto-correct", but the infrastructure is there). Oh, and I've observed people use auto-complete while writing Python code. It's almost like they are writing bullshit to sell the reader something…

3

Also note that our model of state transitions and state tests is very similar to the commands and queries of "command-query separation" pioneered in the Eiffel programming language. However, "transitions" can still return values in our model.

4

The subgraph isomorphism problem is NP-complete, but due to the sparsity of our state machine graphs, and the many constraints that we can use to further narrow our search, such as argument counts and types, and variables bound at the call site, we will usually have a space that is very feasible to search.

5

This is like knowing how to do one thing with one tool, and then watching someone do the same thing with another tool, in order to learn how to use the latter tool in place of the former.