[Info-vax] Volatile, was: Re: yet another sys$qiow question
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Thu Aug 27 06:11:28 EDT 2015
On 2015-08-27 02:14:11 +0000, David Froble said:
> John Reagan wrote:
>> Looking at the code, the updating of the IOSB is done very carefully.
>> There are EVAX_MB builtins used in the code (it is in Macro32) and they
>> make very sure that the first longword is always filled in last (with
>> an MB between storing into the 2nd longword and storing into the 1st
>> longword). I also found code that worries about misaligned IOSBs to
>> ensure atomic updating (including letting an alignment fault occur just
>> to get proper synchronization).
>>
>> Looking in just the [SYS] facility (so that doesn't even count drivers,
>> RMS, RTLs, etc.), I found 389 uses of the MB instruction. That EVAX_MB
>> builtin is mapped to the 'mf' instruction for Itanium. There are
>> equivalent instructions for x86 although the stronger memory ordering
>> rules might make some of them unnecessary (but I doubt that the Macro
>> compiler will be able to figure that out without human assistance)
>
>
> Ok, a question. Just curious.
>
> Do you have any feel for how much time might be saved by monitoring the
> IOSB status instead of using one of the signaling methods, such as an
> AST or event flag?
This is one of the various different instruction issues (pun very much
intended) that have been mixed together in this thread and — for this
detail — less about polling. This is about the use of Alpha memory
barriers and read and write coalescing and reordering, and about
delivering the completion locally within an SMP system. The Alpha
memory rules are a superset — "loosest" — of most any other common —
"non-VLIW" — architecture here, which means that code that's properly
written for Alpha will run on most any other box.
Beyond the use of barriers, polling the IOSB technically also requires
the use of volatile, as the modifications occur outside the purview of
the compiler. This is another instruction-level issue. This because
the compiler is free to load the memory location into a register, and —
since the compiler doesn't "know" there's a modification made outside
of its scope — can keep that value in the register. Or can optimize
the loop in potentially unexpected ways — the compiler can't remove the
testing portion of a polling loop, but it could conceivably remove the
loop. There's no way the value can change once the loop is entered,
after all. This because the compiler doesn't "see" the out-of-band
modification of the IOSB. This mess could conceivably happen in any
language with a compiler that can cache values in registers, too.
As for savings, event flag operations and delivery involve a fair
number of instructions and interlocking, and the memory barriers and
processor mode changes aren't exactly cheap. I don't recall the
overhead for completion off-hand and would have to run tests with the
current OpenVMS code, but it's not trivial when you're really tossing
large piles of I/O around. If IOSB polling is looking advantageous,
then I'd probably also look at using $io_perform[w], and potentially
also at using the driver ALTSTART interface, or maybe a full-on driver
and ACP. But with most applications found on OpenVMS, there are
usually other and hotter spots elsewhere in the application code.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list