[Info-vax] NaT consumption faults with COBOL?

Wed Nov 18 11:04:52 EST 2009

OK children sit down by the campfire and I'll tell you a scary story about 
NaTs!

[holding a flashlight under my chin to illuminate my face...]

There are two NaT-related stories to tell.

1)

The integer registers on Itanium are actually 65 bits with.  64 bits of data 
and a special bit called the NaT bit (Nat is short for Not A Thing).  NaTs 
are like their silent NaN counterparts in the IEEE floating world.  If you 
have a NaT, you can add to it, subtract from it, use it like any register 
operand, etc.  the NaT just propagates along.  You only get into trouble 
when you try to store that register to memory.  If the register's NaT bit is 
set, then you get a fault saying the register really has no value to store.

The two normal ways to clear a NaT bit on a register is to move a literal 
value into the register or load a value from memory into the register.  I 
won't go into on how NaT are set other than saying it is a part of the 
Itanium sepculative load feature where you can try to load a value and if 
the memory location doesn't exist, you get a NaT instead of an ACCVIO.

The GEM code generator often wants to write a register in pieces for things 
like small structs, etc. that were allocated to a register.  So if GEM 
writes to the bottom longword of a register with an 'dep' instruction and 
then writes to the top longword with another 'dep' instruction, you'd think 
that we've now written all 64-bits, right?  Nope.  The 'dep' instruction 
just propagates the NaT bit.  Inserting a value into a portion of a NaT 
still gives you a NaT.  The GEM optimization that turns into those multiple 
deposits into a register is pretty deep inside the flow analyzer.  As we 
found them (via bugreports), we added some extra code that tries to clear 
the register first to clear out the NaT.  It isn't as easy as it sounds 
since the multiple deposits could be very far apart.  We didn't want to 
always clear all registers either since most of the time things are fine. 
Why slow down everybody for the rare code patterns.

And to expose the bugs, the register had to start as a NaT.  How would that 
happen?  Well, at some point recently, you must have executed something 
written in C++ which does use speculative loads and it left a NaT in a 
register.  That flowed into the COBOL routine (even in newly created stacked 
registers which often start as garbage from stuff on the register backing 
store), GEM did multiple deposits into the register and then tried to write 
that to memory.

This is the bug that the COBOL application must have found another occurance 
of.

2)

While Macro-32 doesn't use the code-generator part of GEM, we also found a 
NaT-related problem.

Unlike code in a high-level language which would never blindly store a 
register to the stack "just for fun", it happens in Macro-32 code all the 
time.  I call them courtesy saves.  You've all seen them.  A Macro-32 
routine that is about to use some register (like near a MOVCx for instance 
or needing a quick scratch register on some rare code path) does a PUSHL of 
register(s) to the stack, uses them for whatever, and POPLs them back.  The 
Macro-32 routine doesn't know if the registers had anything meaningful in 
them or not, but did the push/pop "just in case".

For some courtesy saves (PUSHLs near the top of the routine), the compiler 
can recognize them as register saves and actually moves them into the 
routine prologue (turning them into 64-bit saves and uses stacked registers 
on Itanium/memory stack on Alpah).  However, for PUSHLs farther down in the 
program (especially on branches of flow paths), the compiler thinks you 
might actually want to push that value on the stack for perhaps a future 
CALLS or perhaps you are building some data structure like a descriptor.  So 
we generate a 'st4' to push that register onto the memory stack.  And if 
that register contains a NaT?  Yep, fault.  Sucks in user mode.  REALLY 
SUCKS in kernel mode. :)

So what's the poor little compiler to do?  As one of the many pieces in the 
flow graph we build, we now look for paths from the start of the routine to 
register pushs to the stack (not just any store to memory) which didn't 
store into the register first (or have it listed as an INPUT register, 
etc.).  For those registers which might be in a courtesy save, we generate 
extra code in the routine prologue to check if it is a NaT and it if is, 
shove a -1 into the register clearing the NaT.  The courtesy save will now 
save/restore a -1 but it was prepared to save/restore garbage anyway.  We 
have more work to do in the epilogue since if we found a NaT in R4-R7 (the 
preserved register set), we have to put the NaT back since some C++ code 
earlier in the call chain might still expect the NaT to be in place (unless 
the register is marked as OUTPUT or SCRATCH of course).  And it gets really 
nasty when routines branch between each other.  Any epilogue/exit-sequence 
doesn't know for sure which registers to put NaTs back into.  There are 
bitvectors created by prologues for such cases.  The epilogues load that 
bitvector into the predicate set and then do a bunch of predicated 
instructions to restore the saved NaTs into the right registers.

And every year at Halloween the NaT-creature comes back to haunt misbehaving 
old-farts like us and eats our brains.   Boooooo!!!!!!!