August 16, 2022

Paper Summary: End-to-End Arguments in System Design

Dominik Tornow

Dominik Tornow

J. H. Saltzer, D. P. Reed, and D. D. Clark. 1984. End-to-end arguments in system design. ACM Transactions on Computer Systems 2, 4 (Nov. 1984), 277–288.

Key Words Function, Completeness, Correctness, Application Layer, Platform Layer, Failure, Failure Detection, Failure Mitigation.

In their 1984 paper End-to-End Arguments in System Design, Saltzer, Reed, and Clark present a design principle that helps guide placement of functions among the modules of a distributed system. In his paper, the term functions refers to functionality, not a particular function definition in a programming language. Similarly, the term modules refers to layers, not a particular organizational construct in a programming language.

Saltzer, Reed, and Clark assume a layered architectural style. The basic idea of a layered architecture is simple: components are arranged in a layered fashion where components at layer Lₙ can make a downcall to components at layer Lₘ (n < m), generally expecting a response. Exceptionally, components at layer Lₘ can make an upcall to components at layer Lₙ, generally via a previously registered callback.

For this blog post, we limit our discussion to two layers, we will refer to the top layer as the application layer and the lower layer as the platform layer.

Application Layer vs Platform Layer

The End-to-End Argument

The End-to-End Argument states that some functions may “completely and correctly be implemented only” on an application level, implementing said functions completely and correctly on a platform level is not possible. This impossibility is rooted in the fact that the application layer has total information, the platform layer may only have partial information - informally, the platform layer lacks context.

However, the End-to-End Argument does not preclude to provide a partial, incomplete implementation of a function or to duplicate a function on a platform level, not for completeness and correctness but strictly as an optimization.

In addition, the paper stresses that the End-to-End Argument is a guideline that helps in application and platform design analysis; however, identifying the endpoints to which the argument should be applied requires subtlety of analysis of application requirements.

Example

The significance of the End-to-End Argument is most apparent when reasoning about layers and failures: Is a layer able to detect a failure? If a layer is able to detect a failure, what should that layer do? Should the layer mitigate the failure? Should the layer present the failure to the next higher layer?

Reliable File Transfer

The paper discusses several examples, however this blog posts limits itself to discussing one example: Reliable File Transfer.

The object is to move a file from computer A’s storage to computer B’s storage without damage. A popular implementation to transfer a file is to transfer the file in chunks:

  • On the sender side, the application layer splits the data into chunks before handing each chunk downwards to the platform layer for transfer.

  • On the receiver side, the platform layer receives a chunk before handing the chunk upwards to the application layer for assembly.

Reliable File Transfer

So now the question arises, can you implement file transfer completely and correctly by limiting failure detection and failure mitigation to the platform layer or do you need failure detection and failure mitigation (also) on the application layer?

Failure Detection and Mitigation

While the platform layer may indeed detect transmission failures of a chunks via checksums on chunks and mitigate failures via retransmissions of chunks, only the application layer may detect assembly failures on files via checksums on files and mitigate failures via retransmission of files.

Failure Presentation

While the platform layer may indeed try to mitigate transmission failures via retransmissions, eventually, in order to avoid an infinite loop, the platform level has to present repeated transmission failures to the application level.

In summary, although the platform layer implements partial failure detection and mitigation, ultimately only the application layer is able to implement total failure detection and mitigation - only the application layer may determine if a file transfer was a success, was a failure, and how to handle that failure.

Conclusion

The End-to-End Argument states that some functions may "completely and correctly be implemented only" on an application level, even though the End-to-End Argument does not preclude partially implementing functions on a platform level as an optimization.

For example, failure detection and mitigation of a file transfer can (and should) happen on an application level and a platform level but only the application layer can ensure completeness and correctness of the transfer.

Types of Optimization

  • Performance. If the application layer detects a failure in the transmission of the file, the application layer may mitigate that failure by retransmitting the file. However, if the platform layer detects a failure in the transmission of a chunk, the platform layer may mitigate that failure by retransmitting only the chunk. We may be able to avoid retransmitting the file if retransmission of the chunk is successful.
  • Maturity. Even though some functions may completely and correctly be implemented only on the application layer, duplicating functionality on the platform layer may aid correctness; some functions are complex and therefore error prone, encapsulating these functions in the platform layer enables us to take advantage of their maturity and “fill in the gap” of these functions in the application layer