Thoughts on Software Integration, 1

The Problem

People have been writing adapter code since the first software, and into every next software. There are many ways to phrase the problem.

Software, which is built “from the ground up”, is usually not integrable. But natural language always finds a way to work. How can we design a software system where the human power of natural language is exploited deeply enough, so integration becomes natural?
Everything a computer processes is formal. We only need humans to intervene at the right places so it works as we want. Where?
If we are willing to give up optimization, can a good framework of software integration exist?

Most Valid Arguments Yet

Different natural languages can convey the same meaning; there is no point of deciding on a “standard” natural language.
If everything is to be integrated as effectively as natural language, its system must be extremely inclusive—even forcibly so. At least any well-formed idea must be expressible. But at the same time this system should do nearly nothing.
Considering that well-formed ideas are possible without having to resort to a “formal system”, a “formal system” is not what we are directly looking for. Here, “formal system” falls into what I found to be a narrow sense, viz. logic, type systems, etc. Designing a formal discourse does not require that level of generalization, however this is not a declaration of their unfitness.
Applied computation happens on a theory which has the world as a model. Such computation is never the world as it is in itself.
The human intelligence, not any static theory of language, has always been accomplishing integration tasks.
Claim c99b5b66. There exists no tool to humanity which is not polymorphic.
Claim 18ef4a0e. An API is a coherent cross-section of a program. An API conceptually obtains at a certain place in software, if there correspondingly exists an acceptable coherence criterion. This acceptance is completely up to a model understood by a human.
The true nature of integration is obscured by the complications of imperative communication, error handling, lack of imagination, and building too hastily from existing blocks.

Formal Language

Computation

How can some layperson easily program? Would this require them to learn a programming language? Such perspective places too much attention to, metaphorically, how software is built from the logic gates. Computation has no such nature. Any formal and sufficiently expressive language is capable of generating correct structures pertinent to the subject matter. The word “formalization” in the usual sense is exactly what is needed to derive such a language.

Ideally, a programming language should be able to express this language of business logic as closely as possible; it should never require workarounds. But of course at the same time, programmers must not feel obligated to use as many language features as possible.

Generalization

A first way to integration is generalization. When given many different data that are not immediately interoperable, it is sometimes possible to generalize them into a single concept with more moving parts; then the original ones become degenerate cases.

Frame of Reference

One very concrete, approachable yet fake class of integration problems.

3D modeling software use different conventions for the X, Y, Z axes. Earth’s longitude has an arbitrary zero. Different measurement scales are established for the same (classical) universe, and the one we are using has little metaphysical significance. But these systems are not always designed with coordinate conversion in mind; they have their very “native” convention. To integrate them is to tell what coordinates are equal, but without forcing everyone to use a canonical convention.

The above is not a real integration problem; actual integration problems rarely appear in one environment. If I work with world-scale data, I have no problem deciding on using latitude and longitude all the time. If I have a smaller setting that is, say, a building on the ground using Cartesian coordinates, and want positions converted back and forth, there is a mathematically straightforward way. But it is not conceptually straightforward in the first place. For example, the use case possibly makes it conceptually invalid to operate on something 1,000 km away from that building’s coordinate system. A conceptual difference, although possibly small, should exist in an integration problem of the nontrivial category: the components to be integrated originally had different purposes.

Permeating Language

Language shared across theories; the start of nontrivial integration.

Integration is always applied—not pure science or mathematics.

Protobuf, etc.

When I used Protobuf for the first time I realized it was some kind of secret weapon in development. It allows programmers to define a data interface once, then use it across languages, and between endpoints. It is powerful not only because it lets you skip writing the same definition in different languages, but also use the exact same definition across languages, eliminating inconsistencies due to language features. While it is less expressive for static data than actual programming languages, semantic stability is where it wins the game.

There is a further point in this semantic stability. Because Protobuf can deserialize directly to an object, there is no need to use a factory design pattern or any processing specific to the protocol. The factory design pattern indicates a semantic discrepancy between endpoints. By plugging directly into the semantics of the business logic, the project contains less crap of both code and semantics.

Naming

~~Naming things is too difficult in programming, so I am going to give up naming them.~~ Naming things with what we consider as a name has unintended consequences. Each object we want to name usually needs a longer description between theories, but short names are still necessary for us to speak within one theory. In the end we cannot avoid “levitating” these names if we want the system to be fully open to integration. It does more than just eliminating naming conflicts.

See the other article, UID Everything!

Application Programming Interfaces (APIs)

The concept of application programming interface is typically defined as “a set of defined rules that enable different applications to communicate with each other” (IBM). Usually we can imagine a set of library functions, or REST server, or system calls.

A good characterization of the concept of API has been extremely challenging. A few arguments have helped or clarified the goal:

APIs are more than data interfaces. They can also pass functions (such as signal handlers and higher-order functions) and metadata (such as a type to be used in a template).
All APIs use verbs explicitly or implicitly, most common of which are put and get. This verb may have been encoded as a request type or even port number.
A successful analysis must work regardless of what endpoints (server, client, or more exotic) an API has.
A successful analysis must be compatible with both real-world and abstract data.
The common notion of API requires it being exported by the programmer. Presently, a software is by default not integrable; it executes uninterrupted on its own without breakpoints. Any integration requires additional, explicit mechanisms. But even if the programmer never considers exposing an API, it can still conceptually obtain. To continue this discussion we shall include all potential APIs, whether or not they have been implemented by the developer. This has the additional benefit of stripping away the directionality and intended use of the program.
Exporting an API is usually trivial once the programmer has it in mind.
Human intuition about the API plays a huge role. In fact, APIs exist for humans to use.
Protocols are also APIs. They should be properly characterized.

Claim 18ef4a0e. An API is a coherent cross-section of a program. An API conceptually obtains at a certain place in software, if there correspondingly exists an acceptable coherence criterion. This acceptance is completely up to a model understood by a human.

Incidentally, Robert Kowalski: Algorithm = Logic + Control.

Within an unintegrated program, the language is consistent because its author made it so. The process used to arrive at coherence is abstracted away from its API, but is still part of a reality described by the program. Its integration with other programs will describe reality as conforming to both components. The API transfers this coherence between components, using external information such as human attestation to form a bigger picture of reality that is more than the sum of its parts: the components can now “expect” (although not being made to), that the data passed in conforms to new criteria resulting from this integration. This information exists only in the mind of the developer.