NAME

docs/pdds/pddXX_dynbind.pod - Dynamic Binding in Parrot


ABSTRACT

This document describes the design of dynamic binding in Parrot. A dynamic binding is what is created by the Perl 5 local construct, and the Perl 6 temp construct.

[For the time being, it also contains more of the implementation details than are normal for a PDD. -- rgr, 8-Nov-06.]


VERSION

$Revision: $

This is a draft proposal; the Parrot community has not yet reviewed or considered this material.


DEFINITIONS

By ``execution context'' we mean the state of an executing sub that is only visible within the sub's own bytecode. This includes such things as the values in register and the current continuation, but not the sub's error handlers, which must be visible to called subs. The execution context is implemented by the Parrot_Context struct; each interpreter keeps track of the sub that it is currently executing. This can only be changed by either invoking another sub, or invoking a continuation or coroutine. (This is commonly called an ``activation record'' or ``stack frame'' in the compiler literature.)

The ``dynamic environment'' or ``dynamic context'' refers to context established by a running sub that may be used by the other subs that it calls. This includes dynamic bindings of variables and locations, error handlers, and actions that must be executed even (or only) when a block is terminated abnormally.

By ``nonlocal transfer of control'' we mean exactly invoking a continuation or coroutine, since these are the only things that can send control back to an existing execution context. This is akin to ``stack unwinding'', since Parrot_Context structures function as stack frames, even though Parrot does not have a stack per se. The backtrace is implied by links from callee to caller via return continuation; they look just like a stack except that coroutines and continuations may make it branch when looking in the other direction. (We try not to use ``nonlocal exit'' to mean the same thing, since calling a continuation includes the possibility of ``re-entry'' as well.)

Note that ``nonlocal transfer'' does not include function calling, which changes the current Parrot_Context in a straightforward way but never changes the dynamic context. The term does include normal function return, since that is also an application of continuation calling (and may also change the dynamic context).

Each interpreter keeps track of its current dynamic environment in the dynamic_env slot. This is a linked list that branches in parallel with the context call chain. Each dynamic environment entry belongs to the particular context that pushed it, and may only be popped explicitly by code executing within that context. As a consequence of this intimate association, switching to another execution context by calling a continuation also implies changing the dynamic environment. Different continuations can capture different branches of the linked list, which may share structure, so dynamic environment entries must be garbage-collected.

Unfortunately, there isn't yet a term that means both the execution context (Parrot_Context call chain) and the dynamic environment (control stack linked list). We should invent one.

In the general case, nonlocal transfer of control can involve a lateral transfer where the two contexts are ``cousins,'' i.e. neither is a direct descendant of the other; this is the normal case for coroutines. One must change dynamic context by backing down the chain from the originating context to the common ancestor, and then moving forward to the destination context. This traversal of the dynamic environment, called ``rezipping'', is a generalization of unwinding, and has semantics related to popping or pushing the traversed entries. For dynamic binding entries, which are completely reversible, the semantics of going down the stack are identical to popping (except that the entries are not discarded), and going up the stack is identical to pushing.

Deep binding versus shallow binding

There are two techniques for implementing dynamic binding. These are traditionally called deep binding and shallow binding [1].

Deep binding can be implemented by using a single stack of new values. To find the current value, one must do an O(N) traverse of the stack to find the most-recently-bound value, but context switches are trivial.

Shallow binding works by keeping the old values on the stack and stuffing the new values into the location while the binding is in scope. Finding the current value is now O(1), but each context switch caused by calling a continuation requires an O(N) traverse to swap the old and new values.


DESCRIPTION

Dynamic binding involves changing the value of a location temporarily. The location can be a global (variable or sub) or a specified element of a structure (hash or array). The scope of the change is indefinite, which means that any code that has access to the location being modified can see the change during its lifetime. The lifetime of the binding is usually defined by an HLL construct; the location is unbound when the construct is exited, after which the old binding becomes visible again.

In Parrot, each dynamic binding is represented by a BoundLocation object. The bind_location and bind_global operations add bindings to the current dynamic environment, and the unbind_n operation pops a specified number of them off again.

Dynamic binding scopes can also be entered and exited by calling continuations, so returning from a sub automatically unbinds all dynamic bindings the sub may have made. However, calling a continuation that was made within a dynamic binding scope will bring those bindings back into scope, so unless the compiler can guarantee that there are no extant continuations that were taken in that scope, returning or executing unbind_n does not guarantee that the popped bindings are dead.

Dynamic binding and STM

Dynamic binding has some similarities with software transactional memory (STM, [2]), but there are also important differences. Both make temporary changes to memory locations that are visible initially only in the thread that makes them. The key difference is that STM is for changes that are intended to be permanent, whereas dynamic binding changes are always reverted before exiting.

Dynamic binding and STM are therefore orthogonal, in the sense that binding a transaction variable within a transaction hides the transaction: Operations within the scope of the dynamic binding are invisible to the transaction. However, it makes no sense to nest these operations the other way: If you bind a variable to an STMRef, the STMRef will not be visible to other threads, which makes transactions pointless [3].

[maybe put a table of differences here? -- rgr, 24-Nov-06.]

A Perl 5 example

For example, here is a classic Perl 5 idiom:

    sub slurp_file {
        my $file = shift;
        warn "Reading $file ...\n";
        my $data = do {
            local $/ = undef;
            <$file>;
        };
        warn "Reading $file ...done\n";
        return $data;
    }

If compiled to PIR, the result might look something like this:

    .HLL 'perl5', ...
    .sub slurp_file
            .param pmc file
            printerr "Reading "
            printerr file
            printerr " ...\n"
            ## this paragraph corresponds to the inner block.
            .local pmc data, undef, PerlUndef
            PerlUndef = getclass 'PerlUndef'
            undef = new PerlUndef
            bind_global ['main'], '$/', undef
            data = file.'read_line'()
            unbind_n 1
            printerr "Reading "
            printerr file
            printerr " ...done\n"
            .return (data)
    .end

In this case, we assume that ``magic'' variables are kept in the main:: package.

Dynamic bindings are not preserved by tail-calling

Note that if we didn't need to print a final message in the previous example, we could dispense with the unbind_n instruction:

            . . .
            bind_global ['main'], '$/', undef
            data = file.'read_line'()
            .return (data)
    .end

In this case, the binding is popped automatically by the returncc instruction.

We might be tempted to return the result of the file handle method call directly, combining the .return with the method call:

            .return file.'read_line'()

Unfortunately, the read_line method will not see the binding in this case. This .return syntax expands into a tailcallmethod instruction internally, which calls the method using the current continuation, and then retires the current Parrot_Context, discarding the shiny new dynamic binding before the method gets started. tailcall also has this limitation -- in order to preserve the dynamic environment, you must also preserve the Parrot_Context that established it.

Location Classes

Location instances are used to store binding information. The Location and BoundLocation are abstract; the other four can be instantiated. Of those, StructureLocation and GlobalLocation specify locations, and StructureBinding and GlobalBinding are used on the dynamic environment to record bindings state.

                              Location
                              _/  |  \_
                            _/    |    \_
                          _/      |      \_
                        _/        |        \_    
                      _/          |          \_
                    _/            |            \_
                  _/              |              \_    
                _/                |                \_  
               /                  |                  \
       StructureLocation     BoundLocation      GlobalLocation
               |               /        \             |
               |              /          \            |
               |             /            \           |
               |            /              \          |
               |           /                \         |
               |          /                  \        |
               |         /                    \       |
               |        /                      \      |
               |       /                        \     |
          StructureBinding                     GlobalBinding
                      Location class hierarchy

Conceptually, each location object refers to a place that can be fetched or modified. For a StructureLocation, this is represented as an array or hash PMC and an index into it. For a GlobalLocation, this is a namespace object and the (string) name of a variable within it [4].

The bound classes also include BoundLocation, and also have an external value. A StructureBinding object can hold arbitrary INSP values, and remember what kind of value they are holding; conversion, if required, is done in same way as for sub args/returns, and only at the last minute in order to avoid loss of information. Note that GlobalLocation must be given a PMC, since it can only store a PMC in its location.

Location slots should never be changed while the Location is in use by Parrot as a dynamic binding. Doing so can have unintended side effects outside the normal scope of the binding, especially if there are other bindings of the same location in effect. Since it may be hard to be sure that a Location binding is no longer in use (calling a continuation may bring it back into service), it is best to use each Location object for a single dedicated purpose.

Location

Base class for locations. Conceptually, a location is an indirect reference to a value, much like a Perl 5 ``ref''.

[Need generic ``fetch'' and ``store'' operations. GlobalLocation uses just get_pmc and set_pmc, because that's all it needs. Presumably the INS versions could be used in the general case, and GlobalLocation methods could coerce to/from PMC. -- rgr, 12-Nov-06.]

StructureLocation

PMC class built on Location that represents a place in a structure. If the object is an array, then the key must be a integer; if a hash, then the key is pretty much arbitrary.

The StructureLocation class supports the following component keys:

structure => 0
The structure object that contains the located slot.

key => 1
The key of the slot within the structure. The type of the stored key determines how a hash structure is accessed: If the key is stored as a string, the hash value is fetched and set via the *_keyed_str method, etc.

Note that if the key is out of bounds when using the location to store into an adjustable array, the array length is changed to accommodate the new index; if the array is not adjustable, an error is thrown normally.

GlobalLocation

PMC class built on Location that represents the binding of a global variable or sub. It supports the following component keys:

namespace => 0
The namespace object for the namespace that contains the name.

name => 1
The name string of the variable.

In a sense, this is just a StructureLocation where the structure is a namespace.

BoundLocation

Abstract PMC class built on Location that stores a value for the location. These are added to the dynamic environment in dynamic binding in order to affect the value in a given dynamic scope.

The BoundLocation PMC class defines the following additional

value => 2
The stored value for the location. When outside that scope, the value slot holds the new value that is to be established dynamically; when running within that scope, value holds the original value that was saved when the new value was established.

If the value is stored shallowly (as it is for all but globals), then invoking the BoundLocation object swaps the old and new value. This happens implicitly during stack rezipping; it is not safe for bytecode to invoke a BoundLocation explicitly when it is in use for a binding, lest Parrot get confused about which is old and which is new.

StructureBinding

PMC class built on BoundLocation and StructureLocation that binds an arbitrary value in a hash or array element. Such bindings are ``shallow'', which means that other threads will be able to see the bound value. If two or more threads attempt to bind the same location in the same structure at the same time, the result is unpredictable.

GlobalBinding

PMC class built on BoundLocation and GlobalLocation that stores a value for the global variable. Such bindings are ``deep'', which means that they are local to the interpreter that created them; other threads will not see the bound value.

Instances of GlobalBinding are created automatically by the bind_global op.

Dynamic Binding Operations

Location binding operations

bind_location(in PMC, in PMC)
bind_location(in PMC, in STR)
bind_location(in PMC, in INT)
bind_location(in PMC, in NUM)
Given a location PMC in $1 (i.e. a non-abstract subclass of Location), bind it dynamically to the value in $2, with identical scope and lifetime as for bind_global. If $2 is a Null PMC, then the location is effectively made locally unbound, if that is supported by the location.

If $1 is a Location object but not a BoundLocation, then it is cloned into the appropriate BoundLocation class. If it is a BoundLocation subclass, then it is used directly on the dynamic environment; in that case, be careful not to side-effect the object while the binding is in effect, lest strange behavior result.

The manner in which the binding is created depends on the class of the BoundLocation object. After saving the original value, a StructureLocation stores the value $2 into the place specified by the location; this may throw an exception if the assignment is illegal. At the end of the binding lifetime (see unbind_n), it is undone by storing the original value back where we got it from. In contrast, a GlobalLocation keeps the new $2 value in the dynamic environment, so the new binding is visible only to code running in the interpreter that bound the variable.

The register type of $2 determines the type that is stored in the BoundLocation object, which in turn determines how Parrot retrieves and sets it. For example, if the value is a string, the structure value is accessed via the get_string_keyed_* and set_string_keyed_* methods, where the * is determined by the stored key type.

During the dynamic lifetime of the binding and within the dynamic scope of the binding sub, this location will appear to have a different value than outside the dynamic scope (e.g. in coroutines created before the binding), though the value that is bound may change during the lifetime if the location is modified by other means. See BoundLocation and its derived classes for specifics.

Global variable binding operations

bind_global(in STR, in PMC)
Bind the PMC $2 as the value of the global symbol $1 in the current namespace. The newly-created dynamic binding will be used by find_global and store_global in the current dynamic environment only, i.e. this call and all calls made from it. If $2 is the Null PMC, then the variable is made locally unbound.

The lifetime of a dynamic binding lasts until either (a) it is popped explicitly by unbind_n or popmark; or (b) control exits from the context where the binding was made. Note that tail calling implicitly exists the current context; in that case, all dynamic bindings of the calling sub are undone before the tail-called sub starts execution. Note also that one can 'revive' a dynamic binding after exit, effectively giving the binding a new life, by invoking a continuation that was taken when the binding was in scope.

bind_global(in STR, in STR, in PMC)
bind_global(in PMC, in STR, in PMC)
Bind the PMC $3 as the value of the symbol $2 of namespace $1 in the current dynamic context. $1 can be a namespace, or a key that names a namespace. If namespace $1 does not already exist, it is created.

Unbinding operation

This is sufficient for unbinding anything bound by bind_global or bind_location.

unbind_n(in INT)
Pop zero or more dynamic bindings for variables or locations from the current dynamic environment, with the count specified as $1, restoring their original values. There must be at least $1 bindings on the top of the dynamic environment, and all of them must have been bound by the currently-executing sub, or an exception is thrown before any bindings are popped.

Note that an explicit unbind_n is not always needed, as all dynamic bindings for a given sub are automatically undone when the sub returns (see bind_global for details). However, unbind_n is useful when the dynamic binding lifetime ends before the exit from the sub (but see also popmark).


IMPLEMENTATION

This section describes details of the implementation that are not necessary for the PIR programmer or compiler writer to understand.

Rationale for mixing deep and shallow binding

For normal applications that use a single thread of control, both techniques are semantically equivalent and (for variables) probably acceptably fast. The interesting difference lies in the effect they have on applications with multiple threads. Shallow binding causes all threads to see the same binding at any given time. In contrast, deep bindings are only visible in the interpreter that bound them [5].

Shallow binding can also cause strange things to happen if binding and unbinding between two or more threads do not nest. For example, consider the following series of unfortunate events where thread A and thread B bind the same variable dynamically:

    Event        Value after   B saved     A saved
    =========    ===========   =========   =========
    Initial      "initial"     --          --
    B binds      "B value"     "initial"   --
    A binds      "A value"     "initial"   "B value"
    B unbinds    "initial"     --          "B value"
    A unbinds    "B value"     --          --

The ``A saved'' and ``B saved'' columns show what each thread considers the ``old'' value. The final value is ``B value'' instead of ``initial'' because this was the current world-visible value that thread A saved when it bound the variable, and A happened to be the thread that unbound last. With deep binding, in contrast, thread A never sees B's value, nor vice versa, regardless of the event sequence.

It is also possible to implement deep binding and still avoid threading woes by keeping multiple stacks in the interpreter [6]. The extreme case is where each ``stack'' consists of a single new value, with old values still on the dynamic environment stack. Each interpreter would have its own hash of (object, key) => value. That would speed up references somewhat, but still adds significant overhead for structures. Still, it might be worth considering as an optimization for deep variable binding.

On balance, therefore, it seems equally necessary that Parrot use shallow binding for structure locations, and deep binding for variables.

Rezipping algorithm

Given two dynamic environments (one leaving, one entering), the goal is to perform the minimum number of ``zip down'' and ``zip up'' operations in order to switch the current dynamic state from one to the other.

Beyond the current Parrot data structures, we assume that each Stack_Entry structure is given an integer depth, which is assigned when the entry is pushed, and a pointer field we can use as a temporary forward pointer [7]. This makes the overall time complexity proportional to the total number of elementary zips (up or down).

  1. If the depth of leaving dynamic environment (i.e. the TOS of the leaving environment) is greater than that of the entering environment, ``zip down'' the leaving environment entries until they are equal. This involves executing actions and unbinding shallow dynamic bindings, in the reverse order in which they were pushed. Stop before unzipping the leaving entry for which the depths are equal.

  2. Set the forward pointer of the entering TOS to NULL. While the depth of the entering dynamic TOS is greater than that of the leaving state, work down the entering stack to find the entry with the same depth, setting the forward pointer of each entry so traversed to the next higher entry on the stack.

  3. We are now at the same depth in both stacks. While they point to different objects, (a) ``zip down'' the leaving entry, and (b) move down the entering entry while setting its forward pointer. This will terminate when we either find the same entry in both stacks (via pointer equality), or reach the bottoms of the stacks [8].

  4. If we reached the bottoms of the stacks [9], then zip down the last leaving entry, and start zipping up from the first entering entry. Otherwise, zip up from the entry above the common entry (recall that we never wound it down in that case). Similar to zipping down, ``zipping up'' means installing the saved dynamic binding [10]. When we get to the NULL forward pointer, we're done.

Implementation status

Of the PMC classes, only GlobalBinding is defined. Accessing Location slots by name (i.e. using *_keyed_string) is not supported.

bind_global and unbind_n are fully implemented, along with the necessary changes to the find/store code that support deep binding, except that treatment of the Null PMC (to make globals unbound again) has not been tested, and probably doesn't work.

bind_location would be straightforward at this point, but without another BoundLocation classes, it would be equivalent to bind_global.

Full rezipping ought to be made more efficient. Currently, we have to do O(N) operations to find the stack depth; storing the depth in the Stack_Chunk would make this immediate.

unbind_n and the other ops that pop things off the dynamic env don't check for ``underflow'', that is, popping something off that was pushed by the caller. This will become fast and easy to check once each Stack_Chunk records its depth. [For the time being, we could check the old TOS in the RetContinuation, but I'm too lazy to do that. -- rgr, 8-Nov-06.]

Because of deep binding, get_global and set_global have to do O(N) work to look for a binding, whether or not one exists. Speeding this up could be done with a hash while preserving the per-interpreter semantics of dynamic variable binding, but could be a great deal of trouble to implement; it would be better to find a good test case that runs poorly on the current implementation before trying to optimize it.

Outline of work in progress

  1. Consolidate the existing rezipping code. This should also not change functionality, except perhaps to fix some bugs. (Done in r14668.)

  2. Add a dynamic_env pointer to continuation objects, and move the dynamic environment stack into the interpreter struct. (Done in r14876.)

  3. Produce an initial GlobalLocation implementation with which to explore the issues. (In progress.)

  4. Record the stack depth in each Stack_Chunk, use this to make full rezipping more efficient, and also to add underflow checks to dynamic_env 'pop' operations.

  5. Split the initial GlobalLocation implementation into Location, BoundLocation, and GlobalLocation, and finish implementing the accessors.

  6. Implement StructureLocation dynamic binding. This change depends on the previous one, in order to go back up the environment chain.


ATTACHMENTS

None.


FOOTNOTES

[1]
Discussed at http://en.wikipedia.org/wiki/Scope_(programming), but with a different definition of ``shallow binding'' that amounts to a hybrid of the two.

[2]
See http://en.wikipedia.org/wiki/Software_transactional_memory for an introduction.

[3]
Not to mention the fact that undoing the dynamic binding would eventually throw away the result of committing the transaction.

[4]
Not all variables appear in namespaces, though. In Common Lisp it is possible to bind dynamic variables that do not appear in any package (e.g. symbols that are uninterned). So CL will need to define a LispGlobalLocation class.

[5]
I haven't given pdd25_concurrency.pod a proper reading yet, so some of this is probably wrong.

[6]
This is how [1] defines ``shallow binding'', as one stack per variable. I've never actually seen it implemented this way, though.

[7]
Neither of these slots has been added yet, so the implementation is not as efficient as it could be.

[8]
I think that would mean that the two contexts had been created in different threads, i.e. a continuation leaked from one interpreter to the other. If so, this is an error, because it would be disastrous to have two threads that were executing in the same Parrot_Context at the same time.

[9]
If we reach the bottom of one, we must reach the bottom of the other at the same time, due to the fact that they are at the same depth.

[10]
It could also mean calling actions, if the protocol were extended.


REFERENCES

  src/global.c
  src/pmc/globalbinding.pmc