Automatic inference of the most apporpriate container implementation


Container operations:

  • access(serial) Denotes operation's type as a sequential access. More...
  • access(rand) Denotes operation's type as a random access. More...

Container implementations:

  • container(onthefly) Lazy data structure that generates elements on the fly when needed. More...
  • container(solid) Array or data structure that occupies contiguous memory region More...


Containers is a general term referring to data structures that contain a group of elements of certain type.

Considering that virtually every program use containers to store, retrieve, search or otherwise process aggregate data, obviously efficiency of containers' implementation is a priority for the Xreate design.

There are many different data structures invented to serve as containers, each of them having different characteristics and peculiarities with no obvious winner; rather, each one suits best for a different type of situation. Usually it is software developer's knowledge and responsibility to be able to select the most appropriate container's implementation for a particular use case. That said, Xreate goes further and gathers information on how containers are used, by analysing the program sources. On this ground it is possible to choose semi-automatically the most appropriate data structure for container implementation to efficiently fulfil particular needs in a particular situation.

In order to do this, the following approach is used. The way a container is defined is associated with one or more possible implementations it supports. On the other side, operations over a container demand certain "good" implementations to efficiently process container data. Viewing it as a supply and demand setting, with the list of several acceptable implementations from either side, the most appropriate tradeoff is chosen as an implementation for a given container to fulfill both sides with regard to defaults, preferences, constraints and other ways to guide the inference process.

In a short example below

tests/containers.cpp: Containers.Doc_Intr_1
//container definition:
a= {1, 2, 3, 4, 5}::    [num]; container(solid).

//container's operation
x= a[0]::               num; access(rand).

container's offer container(solid) and operation's demand access(rand) are explicitly expressed by annotations for clarity purposes.

Annotation container(solid) depicts that container a supports implementation solid, that is a plain contiguous memory region or array. On the other side, annotation access(rand) expresses the nature of retrieving by index operation (variable x), and it requires selected data structure to support random access to be efficiently executed.

Obviously, implementation solid allows efficient random access, and so it is assumed as a container a's implementation by the inference process.

Semi-automatic, guided container's implementation selection has some advantages, such as:

  • Less manual work. Inferring adequate implementations with little to no additional input from developer saves time to concentrate on more important aspects. Other approach to achieve the same, namely to assign the default one-size-fits-all implementation with an average performance, is simpler but cannot compete with a more careful strategy.
  • Rapid development and optimization antagonism. It is important to keep in mind that rapid development and frequent code changes somewhat contradict optimization efforts. Each round of optimization is leveraged by relying on concrete and particular program properties, overt or hidden connections and other observations. Once a program undergoes further development, most of the previously sound optimization techniques become obsolete, irrelevant or plainly wrong. Selecting (as often as needed) the most efficient data structure allows to automatically maintain a reasonable efficiency level and does not impede possibly fast development pace.
  • Regression resistance. Xreate encourages frequent changes, adjusting and recombining software components, libraries and modules by automatically reevaluating and reassigning most appropriate data structures in the new conditions or signaling an error if it is impossible. This somewhat alleviates the problem of fragile software and gives more confidence for refactoring.

Container Implementations

Xreate supports container implementations presented below:

Implementation 'container(onthefly)'

Source: range list operator [from .. to].

Supported operations: access(serial).

This is an elementary implementation that represents lazy data structure — a sequence of elements is generated by recurrence equation applied to a current element to compute next element of the sequence. It does not keep actual data in the memory, but instead computes the necessary elements when accessed. This kind of implementation is rather memory efficient since the occupied memory does not depend on the the container's size.

For example, range list [1..10] supports onthefly implementation by using internally recurrent function x[i+1] = x[i] + 1, 1<= x <= 10, that generates successive element x[i+1] given x[i].

Recurrent elements' generation is suited for sequential access, and cannot serve random access operations.

Implementation 'container(solid)'

Source: list operator.

Supported operations: access(serial), access(rand).

This is implementation from the opposite side of the memory/computation space. It stores all the container's data in memory occupying a contiguous region, known as array. As opposed to the implementation onthefly, it is computationally efficient, for there is no need for any additional computation apart from simple offset calculation to get an element requested by an index.

Due to the fact that all elements are present in the memory, the implementation supports sequential access as well as random access operations.

Container Operations

In order to describe requirements for a container, all the operations are broken down into several categories as presented below.

Operation 'access(serial)'

Operators: loop map, loop fold.

Annotation denotes sequential access operation, such as map loop or map fold.


tests/containers.cpp: Containers.Doc_OpAccessSeq_1
import raw("scripts/containers/containers.lp").

test = function                           :: int;   entry
  range = [1..5]                          :: [int]; container(onthefly).
  sum = loop fold(range->el:: int, 0->acc):: [int]; access(serial)
      acc + el


Operation 'access(rand)'

Operators: index

Annotation denotes random access operation.


tests/containers.cpp: Containers.Doc_OpAccessRand_1
import raw("scripts/containers/containers.lp").

test = function::         num; entry
  a = {1, 2, 3, 4, 5}::   [num]; container(solid).
  a[1]::                  num; access(rand)          

AST Attachments

In order to avoid tedious writing of necessary annotations for each line of code that works with containers, there are appropriate annotations already defined for common operations. All it takes for client's code is to include transcend script scripts/dfa/ast-attachments.lp that allows to assign predefined annotations to syntactic constructs and operators.

An example below includes ast-attachments.lp that feeds compiler with default annotations, relieving developer from the necessity to specify them manually.

tests/containers.cpp: Containers.Doc_ASTAttach_1
import raw("scripts/containers/containers.lp").
import raw("scripts/dfa/ast-attachments.lp").

test = function                           :: int;   entry
  range = [1..5]                          :: [int].
  sum = loop fold(range->el:: int, 0->acc):: [int]
      acc + el