Location Classes
docs/pdds/pddXX_dynbind.pod - Dynamic Binding in Parrot
This document describes the design of dynamic binding in Parrot. A dynamic
binding is what is created by the Perl 5 local construct, and the Perl 6
temp construct.
[For the time being, it also contains more of the implementation details than are normal for a PDD. -- rgr, 8-Nov-06.]
$Revision: $
This is a draft proposal; the Parrot community has not yet reviewed or considered this material.
By ``execution context'' we mean the state of an executing sub that is only
visible within the sub's own bytecode. This includes such things as the
values in register and the current continuation, but not the sub's error
handlers, which must be visible to called subs. The execution context is
implemented by the Parrot_Context struct; each interpreter keeps track of
the sub that it is currently executing. This can only be changed by either
invoking another sub, or invoking a continuation or coroutine. (This is
commonly called an ``activation record'' or ``stack frame'' in the compiler
literature.)
The ``dynamic environment'' or ``dynamic context'' refers to context established by a running sub that may be used by the other subs that it calls. This includes dynamic bindings of variables and locations, error handlers, and actions that must be executed even (or only) when a block is terminated abnormally.
By ``nonlocal transfer of control'' we mean exactly invoking a continuation
or coroutine, since these are the only things that can send control back to an
existing execution context. This is akin to ``stack unwinding'', since
Parrot_Context structures function as stack frames, even though Parrot does
not have a stack per se. The backtrace is implied by links from callee to
caller via return continuation; they look just like a stack except that
coroutines and continuations may make it branch when looking in the other
direction. (We try not to use ``nonlocal exit'' to mean the same thing,
since calling a continuation includes the possibility of ``re-entry'' as well.)
Note that ``nonlocal transfer'' does not include function calling, which changes
the current Parrot_Context in a straightforward way but never changes the
dynamic context. The term does include normal function return, since that is
also an application of continuation calling (and may also change the dynamic
context).
Each interpreter keeps track of its current dynamic environment in the
dynamic_env slot. This is a linked list that branches in parallel with the
context call chain. Each dynamic environment entry belongs to the particular
context that pushed it, and may only be popped explicitly by code executing
within that context. As a consequence of this intimate association, switching
to another execution context by calling a continuation also implies changing
the dynamic environment. Different continuations can capture different
branches of the linked list, which may share structure, so dynamic environment
entries must be garbage-collected.
Unfortunately, there isn't yet a term that means both the execution context
(Parrot_Context call chain) and the dynamic environment (control stack
linked list). We should invent one.
In the general case, nonlocal transfer of control can involve a lateral transfer where the two contexts are ``cousins,'' i.e. neither is a direct descendant of the other; this is the normal case for coroutines. One must change dynamic context by backing down the chain from the originating context to the common ancestor, and then moving forward to the destination context. This traversal of the dynamic environment, called ``rezipping'', is a generalization of unwinding, and has semantics related to popping or pushing the traversed entries. For dynamic binding entries, which are completely reversible, the semantics of going down the stack are identical to popping (except that the entries are not discarded), and going up the stack is identical to pushing.
There are two techniques for implementing dynamic binding. These are traditionally called deep binding and shallow binding [1].
Deep binding can be implemented by using a single stack of new values.
To find the current value, one must do an O(N) traverse of the stack to find
the most-recently-bound value, but context switches are trivial.
Shallow binding works by keeping the old values on the stack and
stuffing the new values into the location while the binding is in scope.
Finding the current value is now O(1), but each context switch caused by
calling a continuation requires an O(N) traverse to swap the old and new
values.
Dynamic binding involves changing the value of a location temporarily. The location can be a global (variable or sub) or a specified element of a structure (hash or array). The scope of the change is indefinite, which means that any code that has access to the location being modified can see the change during its lifetime. The lifetime of the binding is usually defined by an HLL construct; the location is unbound when the construct is exited, after which the old binding becomes visible again.
In Parrot, each dynamic binding is represented by a BoundLocation object.
The bind_location and bind_global operations add bindings to the current
dynamic environment, and the unbind_n operation pops a specified number of
them off again.
Dynamic binding scopes can also be entered and exited by calling
continuations, so returning from a sub automatically unbinds all dynamic
bindings the sub may have made. However, calling a continuation that was made
within a dynamic binding scope will bring those bindings back into scope, so
unless the compiler can guarantee that there are no extant continuations that
were taken in that scope, returning or executing unbind_n does not
guarantee that the popped bindings are dead.
Dynamic binding has some similarities with software transactional memory (STM, [2]), but there are also important differences. Both make temporary changes to memory locations that are visible initially only in the thread that makes them. The key difference is that STM is for changes that are intended to be permanent, whereas dynamic binding changes are always reverted before exiting.
Dynamic binding and STM are therefore orthogonal, in the sense that binding a
transaction variable within a transaction hides the transaction: Operations
within the scope of the dynamic binding are invisible to the transaction.
However, it makes no sense to nest these operations the other way: If you
bind a variable to an STMRef, the STMRef will not be visible to other
threads, which makes transactions pointless [3].
[maybe put a table of differences here? -- rgr, 24-Nov-06.]
For example, here is a classic Perl 5 idiom:
sub slurp_file {
my $file = shift;
warn "Reading $file ...\n";
my $data = do {
local $/ = undef;
<$file>;
};
warn "Reading $file ...done\n";
return $data;
}
If compiled to PIR, the result might look something like this:
.HLL 'perl5', ...
.sub slurp_file
.param pmc file
printerr "Reading "
printerr file
printerr " ...\n"
## this paragraph corresponds to the inner block.
.local pmc data, undef, PerlUndef
PerlUndef = getclass 'PerlUndef'
undef = new PerlUndef
bind_global ['main'], '$/', undef
data = file.'read_line'()
unbind_n 1
printerr "Reading "
printerr file
printerr " ...done\n"
.return (data)
.end
In this case, we assume that ``magic'' variables are kept in the main::
package.
Note that if we didn't need to print a final message in the previous example,
we could dispense with the unbind_n instruction:
. . .
bind_global ['main'], '$/', undef
data = file.'read_line'()
.return (data)
.end
In this case, the binding is popped automatically by the returncc
instruction.
We might be tempted to return the result of the file handle method call
directly, combining the .return with the method call:
.return file.'read_line'()
Unfortunately, the read_line method will not see the binding in this case.
This .return syntax expands into a tailcallmethod instruction
internally, which calls the method using the current continuation, and then
retires the current Parrot_Context, discarding the shiny new dynamic
binding before the method gets started. tailcall also has this
limitation -- in order to preserve the dynamic environment, you must also
preserve the Parrot_Context that established it.
Location ClassesLocation instances are used to store binding information. The Location
and BoundLocation are abstract; the other four can be instantiated. Of
those, StructureLocation and GlobalLocation specify locations, and
StructureBinding and GlobalBinding are used on the dynamic environment
to record bindings state.
Location
_/ | \_
_/ | \_
_/ | \_
_/ | \_
_/ | \_
_/ | \_
_/ | \_
_/ | \_
/ | \
StructureLocation BoundLocation GlobalLocation
| / \ |
| / \ |
| / \ |
| / \ |
| / \ |
| / \ |
| / \ |
| / \ |
| / \ |
StructureBinding GlobalBinding
Location class hierarchy
Conceptually, each location object refers to a place that can be fetched or
modified. For a StructureLocation, this is represented as an array or hash
PMC and an index into it. For a GlobalLocation, this is a namespace object
and the (string) name of a variable within it [4].
The bound classes also include BoundLocation, and also have an external
value. A StructureBinding object can hold arbitrary INSP values, and
remember what kind of value they are holding; conversion, if required, is done
in same way as for sub args/returns, and only at the last minute in order to
avoid loss of information. Note that GlobalLocation must be given a PMC,
since it can only store a PMC in its location.
Location slots should never be changed while the Location is in use by
Parrot as a dynamic binding. Doing so can have unintended side effects
outside the normal scope of the binding, especially if there are other
bindings of the same location in effect. Since it may be hard to be sure that
a Location binding is no longer in use (calling a continuation may bring it
back into service), it is best to use each Location object for a single
dedicated purpose.
LocationBase class for locations. Conceptually, a location is an indirect reference to a value, much like a Perl 5 ``ref''.
[Need generic ``fetch'' and ``store'' operations. GlobalLocation uses just
get_pmc and set_pmc, because that's all it needs. Presumably the INS
versions could be used in the general case, and GlobalLocation methods
could coerce to/from PMC. -- rgr, 12-Nov-06.]
StructureLocationPMC class built on Location that represents a place in a structure. If the
object is an array, then the key must be a integer; if a hash, then the key is
pretty much arbitrary.
The StructureLocation class supports the following component keys:
*_keyed_str method, etc.
Note that if the key is out of bounds when using the location to store into an adjustable array, the array length is changed to accommodate the new index; if the array is not adjustable, an error is thrown normally.
GlobalLocationPMC class built on Location that represents the binding of a global
variable or sub. It supports the following component keys:
In a sense, this is just a StructureLocation where the structure is a
namespace.
BoundLocationAbstract PMC class built on Location that stores a value for the
location. These are added to the dynamic environment in dynamic binding in
order to affect the value in a given dynamic scope.
The BoundLocation PMC class defines the following additional
value slot holds the new value that is to be established
dynamically; when running within that scope, value holds the
original value that was saved when the new value was established.
If the value is stored shallowly (as it is for all but globals), then invoking
the BoundLocation object swaps the old and new value. This happens
implicitly during stack rezipping; it is not safe for bytecode to invoke a
BoundLocation explicitly when it is in use for a binding, lest Parrot get
confused about which is old and which is new.
StructureBindingPMC class built on BoundLocation and StructureLocation that binds an
arbitrary value in a hash or array element. Such bindings are ``shallow'',
which means that other threads will be able to see the bound value. If two or
more threads attempt to bind the same location in the same structure at the
same time, the result is unpredictable.
GlobalBindingPMC class built on BoundLocation and GlobalLocation that stores a value
for the global variable. Such bindings are ``deep'', which means that they are
local to the interpreter that created them; other threads will not see the
bound value.
Instances of GlobalBinding are created automatically by the bind_global
op.
Location), bind
it dynamically to the value in $2, with identical scope and lifetime as for
bind_global. If $2 is a Null PMC, then the location is effectively made
locally unbound, if that is supported by the location.
If $1 is a Location object but not a BoundLocation, then it is cloned
into the appropriate BoundLocation class. If it is a BoundLocation
subclass, then it is used directly on the dynamic environment; in that case,
be careful not to side-effect the object while the binding is in effect, lest
strange behavior result.
The manner in which the binding is created depends on the class of the
BoundLocation object. After saving the original value, a
StructureLocation stores the value $2 into the place specified by the
location; this may throw an exception if the assignment is illegal. At the
end of the binding lifetime (see unbind_n), it is undone by storing the
original value back where we got it from. In contrast, a GlobalLocation
keeps the new $2 value in the dynamic environment, so the new binding is
visible only to code running in the interpreter that bound the variable.
The register type of $2 determines the type that is stored in the
BoundLocation object, which in turn determines how Parrot retrieves and
sets it. For example, if the value is a string, the structure value is
accessed via the get_string_keyed_* and set_string_keyed_* methods,
where the * is determined by the stored key type.
During the dynamic lifetime of the binding and within the dynamic scope of the
binding sub, this location will appear to have a different value than outside
the dynamic scope (e.g. in coroutines created before the binding), though the
value that is bound may change during the lifetime if the location is modified
by other means. See BoundLocation and its derived classes for specifics.
find_global and
store_global in the current dynamic environment only, i.e. this call and
all calls made from it. If $2 is the Null PMC, then the variable is made
locally unbound.
The lifetime of a dynamic binding lasts until either (a) it is popped
explicitly by unbind_n or popmark; or (b) control exits from the context
where the binding was made. Note that tail calling implicitly exists the
current context; in that case, all dynamic bindings of the calling sub are
undone before the tail-called sub starts execution. Note also that one can
'revive' a dynamic binding after exit, effectively giving the binding a new
life, by invoking a continuation that was taken when the binding was in scope.
This is sufficient for unbinding anything bound by bind_global or
bind_location.
Note that an explicit unbind_n is not always needed, as all dynamic
bindings for a given sub are automatically undone when the sub returns (see
bind_global for details). However, unbind_n is useful when the dynamic
binding lifetime ends before the exit from the sub (but see also popmark).
This section describes details of the implementation that are not necessary for the PIR programmer or compiler writer to understand.
For normal applications that use a single thread of control, both techniques are semantically equivalent and (for variables) probably acceptably fast. The interesting difference lies in the effect they have on applications with multiple threads. Shallow binding causes all threads to see the same binding at any given time. In contrast, deep bindings are only visible in the interpreter that bound them [5].
Shallow binding can also cause strange things to happen if binding and unbinding between two or more threads do not nest. For example, consider the following series of unfortunate events where thread A and thread B bind the same variable dynamically:
Event Value after B saved A saved
========= =========== ========= =========
Initial "initial" -- --
B binds "B value" "initial" --
A binds "A value" "initial" "B value"
B unbinds "initial" -- "B value"
A unbinds "B value" -- --
The ``A saved'' and ``B saved'' columns show what each thread considers the ``old'' value. The final value is ``B value'' instead of ``initial'' because this was the current world-visible value that thread A saved when it bound the variable, and A happened to be the thread that unbound last. With deep binding, in contrast, thread A never sees B's value, nor vice versa, regardless of the event sequence.
It is also possible to implement deep binding and still avoid threading woes
by keeping multiple stacks in the interpreter [6]. The extreme case is where
each ``stack'' consists of a single new value, with old values still on the
dynamic environment stack. Each interpreter would have its own hash of
(object, key) => value. That would speed up references somewhat, but
still adds significant overhead for structures. Still, it might be worth
considering as an optimization for deep variable binding.
On balance, therefore, it seems equally necessary that Parrot use shallow binding for structure locations, and deep binding for variables.
Given two dynamic environments (one leaving, one entering), the goal is to perform the minimum number of ``zip down'' and ``zip up'' operations in order to switch the current dynamic state from one to the other.
Beyond the current Parrot data structures, we assume that each Stack_Entry
structure is given an integer depth, which is assigned when the entry is
pushed, and a pointer field we can use as a temporary forward pointer [7].
This makes the overall time complexity proportional to the total number of
elementary zips (up or down).
Of the PMC classes, only GlobalBinding is defined. Accessing Location
slots by name (i.e. using *_keyed_string) is not supported.
bind_global and unbind_n are fully implemented, along with the necessary
changes to the find/store code that support deep binding, except that
treatment of the Null PMC (to make globals unbound again) has not been tested,
and probably doesn't work.
bind_location would be straightforward at this point, but without another
BoundLocation classes, it would be equivalent to bind_global.
Full rezipping ought to be made more efficient. Currently, we have to do O(N)
operations to find the stack depth; storing the depth in the Stack_Chunk
would make this immediate.
unbind_n and the other ops that pop things off the dynamic env don't check
for ``underflow'', that is, popping something off that was pushed by the caller.
This will become fast and easy to check once each Stack_Chunk records its
depth. [For the time being, we could check the old TOS in the
RetContinuation, but I'm too lazy to do that. -- rgr, 8-Nov-06.]
Because of deep binding, get_global and set_global have to do O(N) work
to look for a binding, whether or not one exists. Speeding this up could be
done with a hash while preserving the per-interpreter semantics of dynamic
variable binding, but could be a great deal of trouble to implement; it would
be better to find a good test case that runs poorly on the current
implementation before trying to optimize it.
dynamic_env pointer to continuation objects, and move the dynamic
environment stack into the interpreter struct. (Done in r14876.)
Produce an initial GlobalLocation implementation with which to explore the
issues. (In progress.)
Record the stack depth in each Stack_Chunk, use this to make full rezipping
more efficient, and also to add underflow checks to dynamic_env 'pop'
operations.
Split the initial GlobalLocation implementation into Location,
BoundLocation, and GlobalLocation, and finish implementing the
accessors.
Implement StructureLocation dynamic binding. This change depends on the
previous one, in order to go back up the environment chain.
None.
LispGlobalLocation class.
pdd25_concurrency.pod a proper reading yet, so some of this
is probably wrong.
Parrot_Context at the same time.
src/global.c src/pmc/globalbinding.pmc