push_dynscope and pop_dynscopebind_global, bind_location, and unbind_n opsLocation class hierarchy
Version 3, draft 1, Bob Rogers, 8-Jul-06.
The goal is to add support to Parrot for dynamic binding, e.g. Perl 5 ``local'' and Perl 6 ``temp''. Dynamic binding necessarily interacts with other Parrot dynamic environment features such as error handling, and with nonlocal exit mechanisms such as continuations and coroutines. Unfortunately, there are currently bugs wherein some nonlocal exit mechanisms can't always do the right thing with respect to the dynamic environment. Furthermore, some of these features (actions and error handling) are in the process of being redesigned. As a result, the scope of the problem must be enlarged in order to ensure the proper interaction of all of these features.
By ``execution context'', we mean the state of an executing sub that
is only visible within the sub's own bytecode. This includes such
things as registers and the current continuation, but not the sub's
error handlers, which must be visible to called subs. The execution
context is implemented by the Parrot_Context struct; the
interpreter keeps track of the sub that is currently executing. This
can only be changed by either invoking another sub, or invoking a
continuation or coroutine. (This is commonly called an ``activation
record'' or ``stack frame'' in the compiler literature.)
The ``dynamic environment'' or ``dynamic context'' refers to context established by a running sub that may be used by the other subs that it calls. This includes dynamic bindings of variables and locations, error handlers, and actions that must be executed even (or only) when a block is terminated abnormally. The dynamic environment can be thought of as a stack onto which various operations push and pop entries. In practice, it may be implemented as several stacks and/or static metainformation associated with the sub, but for now we will speak of it as if it were a single stack.
By ``nonlocal exit'' we mean exactly invoking a continuation or
coroutine [1], since these are the only things that can send control
back to an existing context (execution and/or dynamic). This is
sometimes called ``stack unwinding'', since Parrot_Context structs
function as stack frames, even though Parrot does not have a stack per
se, and even though continuations can ``rewind'' the stack as well as
unwind it. The call trace is implied by the links from callee via return
continuation to caller that would be just like a stack except that
coroutines and continuations may make it branch when looking in the
other direction.
Note that ``nonlocal exit'' does not include function calling, which
changes the current Parrot_Context in a straightforward way but never
changes the dynamic context, but does include normal function return,
since that is also an application of continuation calling.
Confusingly, there is currently also something called the ``control stack'' which keeps track of dynamic context; this also forms a linked list, and branches in parallel with the context chain. Each control stack entry belongs to the particular context that pushed it, and may only be popped explicitly by code executing within that context. As a consequence of this intimate association, switching to another execution context by calling a continuation also implies changing the control stack.
Unfortunately, there isn't yet a term that means both the execution context
(Parrot_Context call chain) and the dynamic environment (control stack
linked list). There ought to be, but I can't think of a good one. One
possibility is ``continuation context,'' for reasons which will become
apparent.
In the general case, nonlocal exit can involve a lateral transfer of control where the two contexts are ``cousins,'' i.e. neither is a direct descendant of the other; this is the normal case for coroutines. One must change dynamic context by backing down the chain from the originating context to the common ancestor, and then moving forward to the destination context. This traversal of the dynamic environment, called ``rezipping'' [2], is a generalization of stack unwinding, and implies semantics related to those of popping or pushing the traversed entries. For dynamic binding entries, which are completely reversible, the semantics of going down the stack are identical to popping (except that the entries are not discarded), and going up the stack is identical to pushing.
This section covers changes that need to be made to the execution infrastructure.
When defined as above, all nonlocal exits can be classified according to the type of continuation-like object that is invoked to effect them. The nonlocal exit types defined natively are as follows:
Continuation
RetContinuation
Exception_Handler
Coroutine
[Note that the Exception_Handler class will be obsolete soon. --
rgr, 8-Jul-06.]
Note that the Coroutine class functions like a Continuation with
additional state, which is really convenient.
The reason for having different continuation-like classes is that
callers have different expectations about their side effects on the
dynamic context when they are invoked. A RetContinuation leaves a
context permanently, even though this is the normal termination for a
sub call. For a Coroutine, the computation is not abandoned at
all, but merely put on hold for later use; we expect to be back. And
for Continuation objects . . . it depends.
In particular, when an action is popped off of the dynamic
environment, Parrot needs to know whether or not to execute the
action, and if it is to be executed, whether to consider it an error
(according to the current convention). My initial impulse was to
define a priori that certain types were always errors, and others were
not, and devise a mechanism for detecting multiple action invocation.
But this seemed rather arbitrary; I wasn't even sure about how to
classify Continuation, and maybe there are situations where an
action could sensibly be invoked more than once.
Then it struck me that any such division was an unnecessary imposition of policy, when Parrot should leave policy decisions to language implementors as much as possible. So, in the name of generality, I would like to propose that Parrot always call the action, passing it the relevant continuation-like object, and let the action decide what it wants to do about it, and whether it needs to complain about multiple invocations [3].
Furthermore, Parrot should call actions when entering contexts as well
as leaving them, passing an ``unwind_p'' boolean that tells the action
which way it's going. This makes it possible to implement such things
as dynamic-wind in Scheme.
However, Parrot still needs to institute conventions in order to reduce interoperability surprises. [scope out interaction with S04:``Control Exceptions'' stuff. -- rgr, 12-Jan-06.] To this end, I think the following conventions make sense:
RetContinuationCoroutineException_HandlerExitContinuation. -- rgr, 8-Jul-06.]
ContinuationContinuation as normal
or abnormal, and must allow for multiple invocation.
Compiler writers should therefore be encouraged to use suitable
Continuation classes for subclassing where continuations are needed
that may be invoked across ``foreign'' code, in order to get predictable
effects. To this end, we probably want to provide more general
classes, but this is beyond the scope of this discussion [4].
Currently, the control stack is used to keep track of the following things:
push_eh and clear_eh instructions.
[The nature of what push_eh pushes is changing, but this by itself
doesn't affect how they are stored.]
push_dynscope and popped by
pop_dynscope or popmark.
pushmark and popmark, and allow
multiple control stack entries to be removed at once.
bsr and jsr instructions, and popped by
ret.
Of these operations, the language features that require the first two have lexically-determined lifetimes. That is, their lifetimes are defined by HLL constructs that are lexical features of the program [5], despite having dynamic scope, so they start and end at precise locations in the program text. We always know where to emit the end-of-lifetime code, and what it is this code has to pop.
Furthermore, and particularly in the case of error handlers and return addresses, all four are meaningful only while that calling context is active, and so must necessarily be popped when the sub returns.
Note that bsr/jsr are not strictly tied to lexical features of
any HLL. However, they are meaningful only within a single context,
and in any case it isn't currently possible to bsr to a place that
(e.g.) does a push_eh and returns, so in practice bsr/jsr
must be used as if they had lexically-determined lifetimes.
push_dynscope and pop_dynscopeWith this in mind, and pursuant to mailing list discussions, I would
like to suggest that pushaction be replaced with the following two
instructions:
Calling continuations can also be used to re-enter dynamic scopes that had previously been left. In this case, all intervening actions are invoked, where ``intervening'' is defined to mean the minimum number of dynamic environment entries that must be traversed to get from the invoking dynamic environment to that captured by the continuation. First, actions are unwound (with a first argument of 1) in the reverse order in which they were established until the common shared environment is reached, followed by ``rewinding'' actions (with a first argument of 1) in forward order until the continuation environment is reached. Depending on program logic, such ``rewinding'' could be done an arbitrary number of times.
[or maybe it would be better to just leave a reference to a more complete description. -- rgr, 9-Jul-06.]
If the stack is unwound due to a popmark or pop_dynscope, which
indicates normal scope exit, the action is invoked with a direction of
1 and a null PMC as the continuation argument. pop_dynscope takes
an argument that specifies whether or not to invoke the action, but
popmark always invokes all actions it pops.
Yielding from a Coroutine is also considered a form of continuation
invocation, so intervening actions are invoked with the appropriate
direction and the Coroutine object. As with continuations, such
actions could be executed an arbitrary number of times. Cleanup
actions will therefore want to test the type of their argument in
order to avoid inappropriate triggering.
Interestingly, pop_dynscope makes stack marking
strictly unnecessary, as the dynamic state could always be popped
explicitly; there is no other Parrot feature that strictly requires
popmark [6]. However, it ought to be more efficient to use a
single popmark at the end of a block to get rid of three or more
dynamic state entries. Since there may be lots and lots of these
little things at the end of some Perl 6 blocks, it seems worth keeping
these ops.
To the list of uses for the control stack, I would like to add dynamic binding. This is a natural place to store dynamic binding state because (a) dynamic binding also has a lifetime that is lexically-determined by HLL syntax, and (b) having a single stack for all of the dynamically-scoped features that are affected by rezipping greatly simplifies the implementation.
Note that this does increase the level of constraint on how PIR
programmers can use the dynamic binding stack, i.e. you can't ret
or clear_eh if there are dynamic bindings in the way. I hope (and
expect) that nobody will care.
Currently, each continuation captures only the executing
Parrot_Context, which in turn implies the control stack state.
This is problematic because a continuation that was created in one
dynamic context might return to a different dynamic context when
invoked. This can cause actions to be executed at the wrong time, and
error handlers and dynamic bindings to be left in force too long.
[example omitted, because the new docs/pdds/pdd23_exceptions.pod
makes it obsolete. -- rgr, 9-Jul-06.]
Currently, control stack entries are managed by the Parrot_Context
that created them, since these are the only things that point to them,
directly or (via other control stack entries) indirectly. That won't
work in the future, because each context only points to its idea of the
TOS, and entries that have been popped may still live on in
continuations. Since an arbitrary number of continuations or other
control stack entries may point to a given entry, refcounting is
required [7]. Fortunately, this is not hard, as there are only a few
places that create/destroy pointers to these entries.
There is also no longer any need even to store the control stack in the
Parrot_Context. Currently, the control_stack field points to the
TOS as of when the context last executed; the current context therefore
points to the current dynamic stack. Once continuations also remember
the dynamic context, this field will not have any useful state when the
context is not executing, because it will always be reset when execution
resumes. So the field might as well be moved to the interpreter object,
where it is always unambiguously ``the current dynamic stack.''
And I would also recommend that the field itself be renamed to
dynamic_env in the process.
This describes the API for PIR coders and compiler writers.
bind_global, bind_location, and unbind_n opsI would like to propose the following instruction interface to dynamic
binding. In a nutshell, this adds (a) a bind_location instruction
that takes an explicit location object and a new value and establishes
the binding on the dynamic environment; (b) a bind_global instruction
with two variants that handles the important special case of global
variables by creating a BoundGlobalLocation object and doing
bind_location on that; and (c) an unbind_n instruction [8] that
explicitly pops a specified number of either kind of dynamic binding.
find_global and store_global in the current dynamic
environment only, i.e. this call and all calls made from it.
The lifetime of a dynamic binding lasts until either (a) it is popped
explicitly by unbind_n or popmark; or (b) control exits from the
context where the binding was made. Note that tail calling implicitly
exists the current context; in that case, all dynamic bindings of the
calling sub are undone before the tail-called sub starts execution.
Note also that one can 'revive' a dynamic binding after exit,
effectively giving the binding a new life, by re-entering a context
that includes the binding via continuation.
BoundLocation class), bind it dynamically to the value in $2, with
identical scope and lifetime as for bind_global. If $2 is a Null
PMC, then the global is effectively made locally unbound, if that is
supported by the location. Otherwise, the binding is created by
storing the value $2 into the place specified by the location $1,
which may throw an exception if the assignment is illegal. At the end
of its lifetime (see unbind_n), the binding is undone by restoring
the original value.
[If $1 is a Location but not a BoundLocation, we might want to
support conversion to the appropriate 'bound' form. This is not
needed in the short term, though. -- rgr, 26-Feb-06.]
[also discuss coercion of values. maybe the location should be typed, so we can coerce when the binding is created? -- rgr, 26-Feb-06.]
During the dynamic lifetime of the binding and within the dynamic
scope of the binding sub, this location will appear to have a
different value than outside the dynamic scope (e.g. in coroutines
created before the binding), though that value may change during the
lifetime if the location is modified by other means. See
BoundLocation and its derived classes for specifics.
Note that an explicit unbind_n is not always needed, as all of a
sub's dynamic bindings are automatically undone when the sub returns
(see bind_global for details). However, unbind_n is useful when
the dynamic binding lifetime ends before the exit from the sub (but see
also popmark).
This also leaves room for a bind_global op with an ``(in PMC, in PMC)''
signature where $1 is a symbol table object, in case Parrot ever defines
such a thing.
Location class hierarchy
Location
StructureLocation
BoundStructureLocation
GlobalLocation
BoundGlobalLocation
VariableLocation
BoundVariableLocation
Conceptually, each location object refers to a place that can be
fetched or modified. For a StructureLocation, this is represented
as an array or hash PMC and an index into it. For a
GlobalLocation, this is a package name (or the null PMC for the
global package) and the name of a variable within it [9].
[VariableLocation is for local variables, i.e. things stored in
registers, but I can't visualize this yet. -- rgr, 26-Feb-06.]
The bound classes also include BoundLocation, and also have an
external value. The ``internal'' value is the one stored in the
location; when the BoundLocation object is invoked, the external
and internal values are swapped. All BoundLocation objects can
hold arbitrary INSP external values, and remember what kind of value
they are holding; conversion, if required, is done in same way as for
sub args/returns, and only at the last minute in order to avoid loss
of information. However, since GlobalLocation can only store a
PMC, other values are converted at the initial binding time in order
to avoid rezipping overhead.
[Still needs more thought. I'm planning on implementing only a subset of this for starters. -- rgr, 26-Feb-06.]
[For now, let us stipulate that neither of the first two slots (the
object and the index) can be changed after the object is initialized.
This could be relaxed later to support changing the index or changing
the value for a BoundLocation (though that would break if the
binding is in effect). However, changing the object after an index
has been coerced according to the type of the previous object will
probably be too problematic, unless the object type is restricted to
something compatible. But I am not aware of any HLL semantics that
would require any of these slots to be mutable. -- rgr, 11-Jun-06.]
LocationVariableLocationLocation that represents an arbitrary variable,
e.g. in a register or lexical. [This may not be such a good idea. --
rgr, 11-Jun-06.]
StructureLocationLocation that represents a place in a structure.
If the object is an array, then the key must be a integer; if a hash,
then the key is pretty much arbitrary.
[Not sure how to represent different types of keys efficiently;
coercing everything into a PMC seems wasteful. It might be better to
support arrays and hashes independently; that way, the index can be
coerced appropriately at initialization time. In that case, hash keys
could also be coerced to strings. Interestingly, that doesn't
strictly require separate Location subclasses for arrays
vs. hashes. -- rgr, 11-Jun-06.]
The StructureLocation class supports the following component keys:
Note that if the key is out of bounds when using the location to store into an adjustable array, the array length is changed to accommodate the new index; if the array is not adjustable, an error is signalled.
GlobalLocationLocation that represents the binding of a global
variable or sub. It supports the following component keys:
[In a sense, this is just a StructureLocation where the structure
is a namespace. Since namespaces act like hashes by design, it might
be good to consolidate the two. On the other hand, namespaces are
more complicated (e.g. typed vs. untype interfaces) and may require
special handling. -- rgr, 11-Jun-06.]
BoundLocationLocation that stores a value for the
location. These are added to the dynamic environment in dynamic binding in
order to affect the value in a given dynamic scope.
The BoundLocation PMC class defines the following additional
value slot holds the new value that is to be established
dynamically; when running within that scope, value holds the
original value that was saved when the new value was established.
Invoking the BoundLocation object swaps the old and
new value. This happens implicitly during stack rezipping; bytecode
should never invoke a BoundLocation explicitly, lest Parrot get
confused about which is old and which is new.
BoundVariableLocationBoundStructureLocationBoundGlobalLocationBoundLocation and GlobalLocation that stores
a value for the global variable.
[This is targeted for the first implementation. -- rgr, 11-Jun-06.]
[tbd. -- rgr, 29-Jan-06.]
Given two continuation contexts (one leaving, one entering), perform the minimum number of ``zip down'' and ``zip up'' operations in order to switch the current dynamic state from one to the other.
Beyond the current Parrot data structures, we assume that each
Stack_Entry structure is given an integer depth, which is assigned
when the entry is pushed, and a pointer field we can use as a
temporary forward pointer.
Here's an outline of subsequent work, which makes some assumptions about how the issues get resolved.
dynamic_env pointer to continuation objects, and move the dynamic
environment stack into the interpreter struct.
[what to do about bsr/ret stuff? methings we should move everything now, and move back only if we find problems. -- rgr, 8-Jul-06.]
These changes should be invisible to existing code and test cases. The hard part will be to get the refcounting right.
Consolidate the existing rezipping code. This should also not change functionality, except perhaps to fix some bugs.[in practice, this may get rolled into the previous step. -- rgr, 8-Jul-06.]
Implement the proposed changes to the action calling protocol. This should not be hard; there are only seven uses ofpushaction in the
codebase, all in t/pmc/exception.t.
This completes the implementation of rezipping as it applies to current uses of the dynamic environment, i.e. without dynamic bindings.
If all goes well, implement dynamic binding. This could even be broken further into two stages, one for implementingGlobalLocation
(to handle variable binding), and a separate stage for
StructureLocation (to handle binding of arbitrary arrays or
hashes).
Coroutine is just a continuation that captures the leaving
context's state every time it is invoked. Perhaps Coroutine should
be a subclass of Continuation?
Continuation to BaseContinuation and
defining a new Continuation class that was a trivial specialization
of BaseContinuation would make ``isa'' testing easier.
popmark
(to get rid of intervening bsr return addresses) followed by a
goto. But this would also be possible by using a continuation,
once continuations restore the dynamic environment properly.
unbind_n corresponds to unbind_globals in the patch posted on
30-Dec-05 (and subsequently withdrawn).
LispGlobalLocation class, because in CL it is possible to bind
dynamic variables that do not appear in any package (e.g. symbols that
are uninterned).