[Info-vax] NaT consumption faults with COBOL?
John Reagan
johnrreagan at earthlink.net
Wed Nov 18 11:04:52 EST 2009
OK children sit down by the campfire and I'll tell you a scary story about
NaTs!
[holding a flashlight under my chin to illuminate my face...]
There are two NaT-related stories to tell.
1)
The integer registers on Itanium are actually 65 bits with. 64 bits of data
and a special bit called the NaT bit (Nat is short for Not A Thing). NaTs
are like their silent NaN counterparts in the IEEE floating world. If you
have a NaT, you can add to it, subtract from it, use it like any register
operand, etc. the NaT just propagates along. You only get into trouble
when you try to store that register to memory. If the register's NaT bit is
set, then you get a fault saying the register really has no value to store.
The two normal ways to clear a NaT bit on a register is to move a literal
value into the register or load a value from memory into the register. I
won't go into on how NaT are set other than saying it is a part of the
Itanium sepculative load feature where you can try to load a value and if
the memory location doesn't exist, you get a NaT instead of an ACCVIO.
The GEM code generator often wants to write a register in pieces for things
like small structs, etc. that were allocated to a register. So if GEM
writes to the bottom longword of a register with an 'dep' instruction and
then writes to the top longword with another 'dep' instruction, you'd think
that we've now written all 64-bits, right? Nope. The 'dep' instruction
just propagates the NaT bit. Inserting a value into a portion of a NaT
still gives you a NaT. The GEM optimization that turns into those multiple
deposits into a register is pretty deep inside the flow analyzer. As we
found them (via bugreports), we added some extra code that tries to clear
the register first to clear out the NaT. It isn't as easy as it sounds
since the multiple deposits could be very far apart. We didn't want to
always clear all registers either since most of the time things are fine.
Why slow down everybody for the rare code patterns.
And to expose the bugs, the register had to start as a NaT. How would that
happen? Well, at some point recently, you must have executed something
written in C++ which does use speculative loads and it left a NaT in a
register. That flowed into the COBOL routine (even in newly created stacked
registers which often start as garbage from stuff on the register backing
store), GEM did multiple deposits into the register and then tried to write
that to memory.
This is the bug that the COBOL application must have found another occurance
of.
2)
While Macro-32 doesn't use the code-generator part of GEM, we also found a
NaT-related problem.
Unlike code in a high-level language which would never blindly store a
register to the stack "just for fun", it happens in Macro-32 code all the
time. I call them courtesy saves. You've all seen them. A Macro-32
routine that is about to use some register (like near a MOVCx for instance
or needing a quick scratch register on some rare code path) does a PUSHL of
register(s) to the stack, uses them for whatever, and POPLs them back. The
Macro-32 routine doesn't know if the registers had anything meaningful in
them or not, but did the push/pop "just in case".
For some courtesy saves (PUSHLs near the top of the routine), the compiler
can recognize them as register saves and actually moves them into the
routine prologue (turning them into 64-bit saves and uses stacked registers
on Itanium/memory stack on Alpah). However, for PUSHLs farther down in the
program (especially on branches of flow paths), the compiler thinks you
might actually want to push that value on the stack for perhaps a future
CALLS or perhaps you are building some data structure like a descriptor. So
we generate a 'st4' to push that register onto the memory stack. And if
that register contains a NaT? Yep, fault. Sucks in user mode. REALLY
SUCKS in kernel mode. :)
So what's the poor little compiler to do? As one of the many pieces in the
flow graph we build, we now look for paths from the start of the routine to
register pushs to the stack (not just any store to memory) which didn't
store into the register first (or have it listed as an INPUT register,
etc.). For those registers which might be in a courtesy save, we generate
extra code in the routine prologue to check if it is a NaT and it if is,
shove a -1 into the register clearing the NaT. The courtesy save will now
save/restore a -1 but it was prepared to save/restore garbage anyway. We
have more work to do in the epilogue since if we found a NaT in R4-R7 (the
preserved register set), we have to put the NaT back since some C++ code
earlier in the call chain might still expect the NaT to be in place (unless
the register is marked as OUTPUT or SCRATCH of course). And it gets really
nasty when routines branch between each other. Any epilogue/exit-sequence
doesn't know for sure which registers to put NaTs back into. There are
bitvectors created by prologues for such cases. The epilogues load that
bitvector into the predicate set and then do a bunch of predicated
instructions to restore the saved NaTs into the right registers.
And every year at Halloween the NaT-creature comes back to haunt misbehaving
old-farts like us and eats our brains. Boooooo!!!!!!!
More information about the Info-vax
mailing list