Probs with variant records
Frank Heckenbach
frank at g-n-u.de
Fri Nov 22 02:17:38 CET 2002
I've followed this thread so far with interest, sometimes with
amazement ... I will reply to some fragments of several mails below,
but before, some general comments.
I think some of you (especially Chuck ;-) are really seeing the
issue more difficult than it really is. The difficulties you see
might stem from two reasons:
- You tend to see things at too low a level (we've had this
discussion before). When you're already on the assembler level and
dealing with offsets etc., it might be hard, indeed. But when you
stay at the Pascal level, many things look (and are ;-) much
easier. This also holds for the actual implementation, BTW. We
don't have to hand-code assembler code for the checks. We already
have a backend which can produce and optimize code for all kinds
of expressions and statements (represented in an internal form
which is close enough to Pascal code, so for the purpose of this
discussion we can assume we actually insert Pascal code for the
runtime checks).
- Mixing up of different kinds of runtime checks (that applies to
many of you). It's clear that GPC currently does very little
runtime checks, and it should be improved. But, it doesn't help to
mix them all together -- one tends to lose overview, and they'll
have to be implemented separately, anyway.
In particular I can imagine the following kinds of checks (and,
BTW, it's clear that all runtime checks will be disableable (is
this a word? ;-) by compiler options -- there's not really a need
to repeat this in every such discussion):
- Range checks -- I use this term in the following strict sense
because I'm used to it from other environments; if someone has a
better suggestion, let me know: A value of an ordinal type is
used in a situation where only a subrange of this type is
acceptable (e.g. assignments, parameters, array indexing, ...)
- Overflow checks: The result of some operation does not fit in
the target type (e.g., adding two large integers). That's
different from range checking, since here the result is not even
a valid integer type. Whereas range checks can be done during
the assignment etc., overflow checking must be done in
connection with the operation itself before it yields a wrong
value. (So, e.g., when you `Write' the result, overflow checking
might still need to be done, whereas no range checking is ever
necessary there.)
- Variant checks: Also quite different from range checks. This is
about whether some field can be accessed at all (at runtime),
independent of what's done with the field
(reading/writing/passing).
- nil/invalid pointer checks
- undefined value checks
- and probably much more ...
J. David Bryan wrote:
> GPC behaves correctly now, but it would perhaps be more useful if variant
> assignments were checked at run time as part of the general implementation
> of range checking (which is on the "to do" list).
As just explained, variant checking is not part of general range
checking and it was not on the list AFAICS. I'm putting it there
now.
> > It is even impossible if the variant selector is not stored....
>
> Well, yes, but I don't see why that would prevent implementation in the
> case where it was stored. In the latter case, it should be possible at run
> time to verify before access that the selector is set to a proper value.
If the selector contains no tag-field and is not a
discriminant-identifier, it is still possible to store the selector
(the standard doesn't say what's stored or where). When allocating
with `New' and a selector given, it can be stored. When a variant is
assigned to, the corresponding selector value can be stored. When
reading from a variant, it can be checked against the stored
selector (if none was stored, it's an error, anyway). Of course,
that's a little more difficult than in the case where there's an
explicit tag-field or a discriminant, but it should be possible.
However, the issue also borders to (general) undefined value checks,
to catch errors such as the following which a simple variant check
would miss:
program Foo;
var
a: record
case b: Integer of
1: (c: Integer);
2: (d: Integer)
end;
begin
a.b := 1;
a.c := 2;
a.b := 2;
a.b := 1;
WriteLn (a.c) { WRONG }
end.
A way to go might be to define a magic value for "undefined" for
each type and store this value into each variant field when a
variant becomes active. Of course, this really is quite some
overhead ...
Prof. A Olowofoyeku (The African Chief) wrote:
> I think I understand the point being made. If you write something
> like "r.a := b;", how does the compiler perform the range check
> (either at compile time or at runtime)? The value of "b" could be
> anything and might have been set at any point in the code, in any
> number of ways (which might not be obvious at compile time, because
> it might depend on user input). And, presumably, "b" might be a 16-
> bit integer, and so the value of "b" could be "19000" or whatever. Or
> am I missing something?
Others have replied to that already, but I really don't see why you
see a problem there. Of course, the actual value might come from
anywhere -- that's why it's a runtime check. At the time of
assignment, the actual value is known (obviously) and therefore can
be checked. I also don't see what difference a 16-bit integer would
make. It has a value just as well, and this value can be inside or
out of the range of the target, whether that is an 8, 16, 32 or
however large value. (In some cases it's known at compile time that
it can't be out of range, but this only simplifies things.)
J. David Bryan wrote:
> To implement a check for this requirement, the compiler might modify the
> above range-checking pseudo-code to:
>
> if rcd.a = 1
> then
> if something >= -5 and something <= 512
> then rcd.twobyte := something
> else range_error
> else
> variant_error
Exactly. I'd prefer to rearrange this a little, taking into account
that the error procedures don't return:
if rcd.a <> 1 then variant_error;
if (something < -5) or (something > 512) then range_error;
rcd.twobyte := something;
This makes it quite clear that both checks (range and variant) are
completely independent, and can be individually turned on and off
(even the order in which they're done doesn't matter).
Of course, there are cases where some checks can be omitted by
compile-time knowledge. Some cases are easy to detect, some not so
easy (you get the full scale from a constant comparision to the
halting problem ...). Doing the optimal thing in this regard (if
there is an optimal thing) is very hard. As far as I'm concerned,
when I do it, I'll only do the simplest optimizations (basically,
constant comparisons), and perhaps the backend will catch a few
more. Anything else is IMHO not worth the effort since we're only
talking about an optional debugging aid. For the occasional tight
loop where speed really matters, the programmer might want to choose
types etc. in order to help the compiler avoid unnecessary checks,
rather than hoping for some complicated optimizations.
CBFalconer wrote:
> This logic can probably be simplified with unrestricted set size.
> But consider:
>
> r = RECORD
> CASE i : integer OF
> 0..1000, 10000: (c : char);
> -9999, 3000..4000: (j : integer);
> END; (* r *)
>
> Note the negative value!!
Also here, others have replied, but I wonder what's so special about
the negative value. Comparing the selector with -9999 is just as
easy as comparing with 1.
> I remain unconvinced that ranges in variant selectors are a GOOD
> THING.
One example that comes to mind might be a parse tree for arithmetic
expressions. The variant selector could be an enum type that
represents the operator. Since all binary operators have the same
fields, it would be natural to put them in one variant, and a range
(provided the binary operators are in sequence) is just a little
easier to maintain than a list of values.
J. David Bryan wrote:
> But consider a "regular" (executable) CASE statement appearing in the body
> of a program. To decide whether a specific contained statement is to be
> executed, the same sequence of possibly disjoint checks must be made. The
> only difference between this and the variant access situation is the action
> taken after the check: a transfer of execution or an allowed field access.
> So the checking code already exists in GPC; it "simply" :-) needs to be
> emitted in association with a variant field access.
Not exactly. In a `case' statement, the compiler produces code for
all variants at once (possibly using jump tables etc.), for a
variant access we need to check only one case. In fact, the code to
be used here is quite similar to that emitted for an `in' expression
whose right operand is a set constructor.
> Essentially, yes, although I would imagine that the loop would be unrolled
> to perform sequential checks to improve speed.
Indeed (or rather, a series of checks would be generated right away
instead of a table and a loop; that's also easier to do).
> The GNAT (Ada) compiler
> actually puts the selector-checking code for a specific variant into a
> compiler-generated function, so that accesses to any fields contained
> within that variant may be checked by calling the common routine (i.e.,
> avoiding the replication of checking code at each field reference within
> the program).
I doubt whether that's too useful. The cases where the function call
overhead might be worth it (very complex lists of selectors/ranges)
are probably very rare in practice, so it doesn't seem worth
optimizing for them.
[Assembler stuff skipped. This goes to you, too -- if you think
something can be improved in the assembler code, then please contact
the respective backend maintainers. Any improvement made there at
the low level would benefit all languages and all situations (e.g.,
explicit comparisons as well as automatic checks). Here, let's
please stay at the high level.]
Frank
--
Frank Heckenbach, frank at g-n-u.de, http://fjf.gnu.de/, 7977168E
GPC To-Do list, latest features, fixed bugs:
http://www.gnu-pascal.de/todo.html
GPC download signing key: 51FF C1F0 1A77 C6C2 4482 4DDC 117A 9773 7F88 1707
More information about the Gpc
mailing list