CommunicationSafe usage of mutable and global variables
The chapter discusses safe usage of non-local variables, that is variables accessible by different components or threads, with global variables as a particular case.
Syntax
Annotations:
- annotation (1) marks SEND communication event.
- annotation (2) marks RECEIVE communication event.
Specializations:
Communication reasoning is able to assign the following specializations:
- commDirect — specialization is expected to provide direct access to raw variable's content.
- commGuarded — specialization is expected to do internal consistency checks at run time.
Background
One of the major concepts that support writing of safe programs is a notion of immutability. Immutability tremendously simplifies many kinds of analyses; using immutable structures is a practical way to write multithreaded applications and has many other benefits beyond that. However, in its most basic form it comes with the price of a disastrous, in many cases, memory overhead, since the property of immutability implies that with each change of a variable an independent copy of it shall be made, occupying a different memory region. Unwise use of immutable structures would lead to a situation where CPU is mostly occupied with unnecessary variables copying to and fro as well as with extensive garbage collection, irrelevant of actual algorithm's complexity at hand. Thus one of the central points of a proper programming language design is enabling the techniques that would help to overcome the shortcomings by relaxing the immutability requirements while keeping the safety benefits. There are many ways to approach the problem, and one such technique, namely communication model, is discussed next.
Communication Model
Communication model is a way to capture and express what is going on with variables in a program as well as to define rules that describe valid operations over variables. Within this framework, writing a value to a variable is viewed as sending, while reading a variable's value is viewed as receiving.
Variables that are accessed from different components or threads are referred to as non-local variables. This chapter is focused on non-local variables, the global ones particularly, since it is the very type of variables for which it is difficult to perform a manual exhaustive check and find out where and how they are used in order to catch any errors. It is natural to view them as the means of an interaction between different parts of a program, in other words, an interaction between a sender and receiver, where such sender and receiver are different components. The same terms comprise rules that express valid ways of interacting. The abstraction is named communication model due to similarity with the network communication.
Reasoning based on working with a communication path, i.e a chain of communication events (e.g. sending/receiving) that occur during program execution.
Let's consider a small example:
a = init():: int; commop(send). //(1) b = a + 1 :: int; commop(receive). //(2)
It shows computing of the variable b. Variable b depends on a so a is calculated first. Variables a, b are annotated with comm(send) and comm(receive), denoting sending and receiving events, respectively. Communication path in this case is an ordered list `{<begin>, SEND, RECEIVE,
<end>}` where `<begin> and <end>` are special events that denote the first and last events in the path, respectively.
The gist of using communication model is to ensure that every sent value is properly received. It relies on the compiler to gather all possible communication paths in the program as an input for processing. There are two supported modes of reasoning:
- Validation. In this mode all communication paths are checked against communication rules to confirm that the program is valid. Otherwise a compilation error is raised.
- Planning. In this mode reasoning assigns proper implementation for variables in efforts to ensure validity.
Validation
To perform validation, every communication path is checked against number of communication rules that define which communication paths are valid. Default behaviour expressed by the "every sent value being properly received" presumption results in the next possible cases:
- Valid. Paths that consist of pairs of events {SEND, RECEIVE} are valid meaning that each sent value was properly received.
- Undefined and expired value. Paths that have the parts {<begin>, RECEIVE} or {RECEIVE, RECEIVE} are invalid meaning that either a possibly undefined value was received in the first case or a duplication, i.e. an expired value, was used in the second case.
- Lost value. Paths that have the parts {SEND, SEND} or {SEND, <end>} indicate possibly lost change since the consequent SEND event replaces the value in the former case while the sent value is not used at all in the latter case.
Traditional immutability validation is based on a concept that a valid value shall remain valid as long as it was not modified. In this regard the communication model can be viewed as an extension and more expressive tool since it also captures value expiration after it was used as well as value loss, if it was not used at all.
Planning
Aside of performing validation, reasoning in the communication model also assigns appropriate specialization to sending and receiving operations, as appropriate. At the moment there are two specializations the operations are expected to support:
- Direct. Direct specialization commDirect is expected to provide direct access to a variable's value. This specialization is assigned in case of a fully statically validated communication path.
- Guarded. In case there are possible communication path inconsistencies that cannot be completely ruled out at compile time, checking logic should be embedded into compiled code. Specialization commGaurded is expected to track a variable's state in order to be able to check usage consistency.
Planning Horizon
Reasoning implements an algorithm that is bounded by the maximal path length it can process. The parameter is called planning horizon. Any variable that it cannot check due to exceedingly large path's length is assigned default implementation commGaurded that performs necessary checks during runtime. Thus the parameter establishes a trade-off between the static analysis extensiveness and the runtime checks overhead.
Example: Direct Implementation
import raw("scripts/dfa/propagation.lp"). import raw("scripts/dfa/polymorphism.lp"). import raw("scripts/effects-communication/communication.lp"). import raw("scripts/effects-communication/config.lp"). CommDirect = type { value:: int }. guard:: commDirect { init = function::CommDirect { {value = 0} } read = function(vault1:: CommDirect):: int { (vault1:: *;commop(receive))["value"] } write = function(vault2:: CommDirect, valueNew:: int)::CommDirect { (vault2:: *; dfa_pseudo(vault2)) + {value = valueNew}:: int; commop(send); dfa_uppy(vault2) } } main = function::int; entry { x1 = init()::*; dfa_polym(ret). x2 = write(x1, 1)::*; dfa_polym(arg). val = read(x2)::int; dfa_polym(arg). val }
In this example, basic workflow is presented in main — the function write(x1, 1) is invoked following by invocation of read(x2). Functions write() and read() are annotated with commop(send) and commop(receive) respectively in order to enable communication reasoning. Analyzer gathers and validates observed communication path, and since there is no ambiguity, it is possible to assign specialization CommDirect allowing direct access to the variables avoiding any additional overhead. Note, there are no other specializations defined, and if reasoning was not able to conclude direct access specialization, then the compilation error would be raised.
Example: Guarded Implementation
import raw ("scripts/effects-communication/communication.lp"). import raw ("scripts/dfa/propagation.lp"). import raw ("scripts/dfa/polymorphism.lp"). import raw ("scripts/effects-communication/config.lp"). CommState = type variant{Invalid, Valid, Outdated}. CommDirect = type { value:: int }. CommGuarded = type { value:: int, state:: CommState }. guard:: commDirect { init=function::CommDirect{ {value = 0} } read= function(vault1:: CommDirect):: int{ (vault1::CommDirect; commop(receive))["value"] } write= function(vault2:: CommDirect, valueNew1:: int)::CommDirect{ (vault2::CommDirect;dfa_pseudo(vault2)) + {value = valueNew1}:: int; commop(send); dfa_uppy(vault2) } } errorRead = function:: int { -1 } errorWrite = function:: CommGuarded{ { value = -1, state = Invalid() } } guard:: commGuarded{ init=function::CommGuarded{ { value = 0, state = Invalid() } } read=function(vault3:: CommGuarded):: int { switch variant (vault3["state"]->whatever::CommState;commop(receive)):: int case (Invalid) { errorRead() } case (Outdated) { errorRead() } case (Valid) { vault3["value"] } } write=function(vault4:: CommGuarded, valueNew2:: int)::CommGuarded{ switch variant (vault4["state"]->whatever::CommState;commop(send); dfa_pseudo(vault4))::int case (Invalid) { {value = valueNew2, state = Valid()}:: CommGuarded; dfa_uppy(vault4) } case (Outdated) { {value = valueNew2, state = Valid()}:: CommGuarded; dfa_uppy(vault4) } case (Valid) { errorWrite():: CommGuarded; dfa_uppy(vault4) } } } main=function(cmd:: num)::int; entry { x1 = init():: *; dfa_polym(ret). x2 = write(x1, 1)::*; dfa_polym(arg). x3 = if (cmd > 0)::int { y = read(x2):: int; dfa_polym(arg). y } else { z = write(x2, 2)::*; dfa_polym(arg). a = read(z):: int; dfa_polym(arg). a }. x3 }
Here is an example of a slightly more complicated workflow. Function main contains branching that depends on an argument known at run time only. Analyzer is presented with two possible communication paths, one of which (the false branch) leads to a possibly lost value for it contains two consequent SEND events. In this situation analyzer is unable to statically validate correctness and assigns specialization commGuarded to embed checking logic into compiled code as an intermediary layer between the variable's content and the client's code. Implementation commGuarded besides accessing to a variable's value also tracks the variable's status and will return error if the value is inconsistent.