Exploitation
Shared usage of the external resources: initialization and finalization

This chapter discusses exploiting external resources, such as files, as a particular instance of a side effects problem that inevitably stems from an interaction with the outside world. Unlike virtualization, another topic that deals with I/O, exploitation approaches the subject from a different angle — it is concerned with order of operations, a sequence in which different clients jointly use the same resource, and it deals with corresponding difficulties, e.g. ensures proper initialization of a resource before its actual usage.

Syntax

Annotations

SYNTAX: use(resource-id) init(resource-id)
  • resource-id — user-defined resource identifier

Annotates a function or code block as such that exploits resource resource-id.

Guards

SYNTAX: exploitation(init) exploitation(none)

Specializations that are recognized by exploitation reasoning. Each specialization corresponds to an initialization strategy:

  • exploitation(init) is expected to perform actual resource initialization.
  • exploitation(none) is expected to do nothing as initialization is either not necessary or done elsewhere.

Background

In software engineering, the idea to avoid side effects has received considerable traction. Indeed, side effects are something that is hard to take into account, and thus programs that have side effects are inherently unsafe, so best coding practices rightfully suggest to isolate the code that produces side effects as much as possible. It is so called pure functional languages whose philosophy goes even further and frames side effects as something opposite of "pure" and built around effectless computations, to the point that some languages' design itself includes side effects producing constructs, such as I/O, describing them as an afterthought, as something almost unnecessary.

However, in reality the opposite is true, as most applications' sole responsibility is to communicate with the "outside world", reacting to the external events and changing the "world's state" accordingly. As a consequence, side effects usually are the only important effects produced by a program, and they surely deserve a first class support from a programming language and justify the efforts to develop an approach to alleviate the related safety and performance concerns.

Exploitation Plan

One complexity with taking side effects into account is that the final result depends on an exact order of operations. For many techniques, this harshly impacts both performance and safety, e.g. caching or parallelization can neither be automatically performed nor validated since they are based on various degrees of reordering or deal with possibly undetermined beforehand order of execution.

In this chapter, it is assumed that final effects of execution are fully defined by exploitation path — for a particular code path that can occur during execution, it is its part consisting of only relevant code blocks., i.e. those that deal with an exploited resource. Other (unannotated) code blocks are assumed to not produce exploitation effects, so they are excluded from consideration. Thus reasoning about effects is reduced to considering all possible exploitation paths, checking if they meet certain requirements that define valid exploitation, and making corrections if needed and possible.

The result of the reasoning is called exploitation plan — a specification that defines the exact order and strategy of using a given resource in order to comply with imposed requirements.

With all above said, the discussed approach can be presented as follows:

  • Annotations are used to express certain aspects of side effects to enable further reasoning. They indicate the code blocks that deal with a resource as well as provide additional information about how exactly it is exploited, e.g. using, initializing or deinitializing the resource.
  • Existing code paths extracted during source code processing and coupled with relevant annotations are enough to construct all possible exploitation paths and analyze them. The analysis determines possible (weak) paths that can either occur or not occur during particular execution, as well as certain paths (strong) that occur under any conditions. Also it checks if an exploitation path is valid against certain exploitation rules, e.g. initialization always occurs before the actual usage, and if it is possible to correct the invalid paths.
  • The reasoning's result is an exploitation plan that dictates the order and strategy of exploitation presented in the form of an appropriate specialization for polymorphic functions that deal with resources in order to ensure safe exploitation to the extent based on provided annotations.
  • Exploitation-related side effects are viewed as a set of additional restrictions over the operations' order. Only the subset of possible reorders is still valid w.r.t. side effects. Transcend's task is to achieve a refined set of valid orders. Thus techniques that rely on reordering enjoy an additional information to make safe optimizations.

... and it serves three major goals:

  • Safety. Validates the existing exploitation plan, or determines if it is possible at all to safely exploit a given resource. Compiler signals an error if a given exploitation plan is invalid, i.e. does not satisfy requirements w.r.t. side effects as expressed by annotations.
  • Regression Resilience. When it comes to using external resources, some spurious dependencies usually occur between otherwise isolated, independent components of a program. Sometimes refactoring or other code changes break those dependencies, which inevitably results in regressions. Exploitation catches this sort of regressions and automatically regenerates the exploitation plan suited for changed conditions.
  • Performance. Generated exploitation plans are optimal in a sense that they cut off superfluous operations, for example, by preventing resource initialization in several places if it can be done safely in just one place, thus reducing overall overhead.

Domination Analysis

When it comes to reasoning about the order of execution and possible code paths, the crucial vehicle for that is domination analysis producing dominator tree as an output.

Unlike the usual function-bound domination analysis, when a separate domination tree is produced for each function defined in a program, Exploitation requires a program-bound analysis, that is taking into account the control flow across all functions in a program. It is a computationally intensive task to perform an analysis over a whole program, however it is compensated by the fact that Exploitation only takes into account the code blocks that deal with or, in other words, exploit external resources. Thus there is no necessity to build a full dominator tree, only the relevant parts are constructed, just enough to make sound exploitation plan decisions.

Empty Exploitation Plan. Effect Free Computations

Validation of exploitation path is done against some predefined constraints. Depending on complexity of the constraints, i.e. the number of different exploitation events that are sought for in each path, reasoning goals are categorized into several groups:

  • Zero Order Exploitation. Meaning that all paths are checked to determine if there is exploitation at all or not, and if there is at least a single exploitation event along the path.
  • First Order Exploitation. Deals with the situations where it is enough to check only two different exploitation events occurring in a required order. It can be useful for example, to check whether all resource uses occur after it is initialized.
  • Higher Order Exploitation. Expresses constraints involving several (more than two) exploitation events and relations between them.

Empty Exploitation is an important instance of the zero order constraint. It is a useful mechanism for the developer to annotate a function or part of a program as effect free in terms of exploitation. Thus, effectless, clean or pure code can be clearly separated from the effectful part, and compiler raises a compilation error in case of accidental mixing or using "wrong" type of code in a non-appropriate environment.

Resource Initialization

One important problem related to exploitation order is to ensure that a given resource is properly initialized before its first usage and, additionally, that it is not initialized more than once during the exploitation session. This is the instance of first order exploitation since in a validation mode it is enough to check the exploitation plan to ensure that every resource usage was preceded by resource initialization at some point in the past, i.e. previously in the exploitation path.

For the planning mode, the problem is addressed as follows:

  • The central idea of the algorithm is to consider candidates for initialization only among those code blocks that dominate a given usage site. Obviously, initialization in the dominating block precedes usage for any possible code path.
  • One or more dominator blocks are chosen for actual initialization in such way that they cover all the found usage sites.
  • For the code blocks chosen for initialization, the specialization exploitation(init) is set, while for the rest of the code blocks the specialization exploitation(none) is used.

Please take a look at the example below:

tests/exploitation.cpp: Doc_ResourceInit_1
import raw("scripts/cfa/payload.lp").
import raw("scripts/exploitation/exploitation.lp").     //exploitation reasoning
import raw("scripts/exploitation/test1.assembly.lp").

guard::                                                 exploitation(init) 
{
  openFile = function(filePrev:: FILE_P)::              FILE_P; init(file) 
  {
    fopen("/tmp/test", "w")::FILE_P
  }
}

guard::                                                 exploitation(none) 
{
  openFile = function(filePrev:: FILE_P)::              FILE_P 
  {
    filePrev::int
  }
}

test = function::                                       int; entry 
{
  seq
    { f0 = undef:: FILE_P. f0 }
    { 
      //Scope #1:
      f1 = openFile(f0)::                               FILE_P. 
      f1
    }
        
    { //Scope #2:
      f2 = openFile(f1)::                               FILE_P. 
      f2
    }
    { 
      //Scope #3:
      sizeWritten = fwrite("Attempt to write..", 12, 1, f2):: int; use(file).
      sizeWritten 
    }
    { 
      //Scope #4:
      fclose(f2)::                                      int; use(file) 
    }
    { sizeWritten :: int}
}

There is the function test that executes sequentially the following commands: open a file (scopes #1 and #2), write some text (scope #3), and finally, close the file (scope #4). It represents a simple work flow with an external resource.

In order to connect the code to the exploitation, the functions fwrite and fclose in scopes #3 and #4 respectively are annotated with annotation use(file). This information is used by reasoning to look whether it is possible to initialize a given resource before actual usage as well as where and when exactly to initialize it. Function openFile is annotated as init(file) meaning it can initialize depending on the chosen strategy. The function is invoked both in scope #1 and scope #2. Both scopes are executed strictly before scopes #3, #4. Thus it is indeed possible to initialize the resource before usage. Next task for exploitation is to choose the correct exploitation plan, i.e. to assign strategies for all possible initialization places in the effort to initialize the resource only once. Here, it means that only one invocation of openFile is assigned with exploitation(init) to actually initialize the file. The other one is automatically marked with exploitation(none) to invoke a different specialization of openFile that does nothing since the file was already initialized.