[Info-vax] yet another sys$qiow question

Thu Aug 20 08:09:34 EDT 2015

On 2015-08-20 05:10:59 +0000, David Froble said:

> Stephen Hoffman wrote:
>> On 2015-08-19 23:30:37 +0000, David Froble said:
>> 
>>> 
>>> Ok, just so dumb me can understand, did you just write that before I'm 
>>> notified of the completion of a QIO, the IOSB has been updated and 
>>> ready to read?
>> 
>> Correct.  The following assumes that the $qio has been accepted by 
>> OpenVMS without error, and is performing the requested processing 
>> asynchronously.   Upon completion of the operation and while blocking 
>> user-mode activity, the I/O post-processing populates the IOSB if 
>> specified, then sets the event flag if specified, then triggers the 
>> completion AST if specified.  If data is to be returned by the 
>> particular $qio call, the data is loaded into the user-specified buffer 
>> sometime between when the $qio is queued and when the I/O completion 
>> post-processing is performed.   By the time your user-mode code is 
>> running whether as the AST or in response to the event flag, the IOSB 
>> is always valid, and any associated data is written from or to the 
>> user-specified buffer, if appropriate.   If the IOSB is still zero, 
>> then you're either sharing the IOSB across calls or have otherwise 
>> mis-allocated the IOSB storage, or you're sharing event flags, or other 
>> such coding mistake.
> 
> Having read the fine manual, that's how I've understood things work.  I 
> always check the queuing of an operation, and upon completion, I 
> usually use ASTs, I then look at the IOSB, and finally the result of 
> the operation.
> 
> Reading this thread has caused some confusion ....

There are two sorts of volatile in play here: stack storage that's 
volatile and may be out of scope by the time completion arrives.

There's also that compilers can optimize the code such that the IOSB 
values involved are stored in a register in application scope and are 
not re-fetched from memory, as the compiler is unaware that the memory 
involved is "shared" and may have been modified outside of the scope of 
the code being compiled; a compiler- and multiprocessing-related issue.

If VSI finds that this is a problem and not something that can be 
mitigated via existing or additional compiler heuristics, then they can 
either sprinkle some volatile tags into their IOSBDEF declarations, or 
programmers can do the same through their references to the IOSBs.  Not 
that BASIC and some other languages have any volatile-like construct 
nor — unlike C (via C11) and some other languages — multithreading 
support, AFAIK.   If BASIC gets aggressive with registers and happens 
to "inappropriately" cache the IOSB in a register, the applications can 
get tangled.

There's a variant of this second case for application-shared memory in 
multiprocessing: Alpha can play havoc with incautious coding performing 
repeated read or write operations and un-interlocked and un-barriered 
read or write access.  Alpha is very aggressive about reordering and 
coalescing.   Itanium is vert slightly less aggressive about reordering 
and coalescing ("write combine"), and — with a few salient exceptions — 
x86-64 largely doesn't reorder, and also doesn't tend to aggressively 
cache data in the relatively few registers available.  
<https://en.wikipedia.org/wiki/Memory_ordering>   I/O completion is 
expected to happen on the same process and same thread that originated 
the I/O, which — in a "non-threaded" process — is your sole and primary 
process, so this cross-processor-caches case doesn't apply to the vast 
majority of folks programming on OpenVMS, even in a multiprocessor.

Processors in a multiprocessor server tend to have several levels of 
private and shared caches.  These caches are usually physically and 
electrically closer to the processor, lower latency and/or 
higher-bandwidth, as compared with main memory.   Registers are really 
just the fastest and most specialized and most expensive form of a 
processor cache, after all.   Disks and tapes are the slowest.  
Interestingly, networking is very fast, which can mean that 
server-level mirroring into remote memory is much more speed-effective 
than RAID mirroring.   But I digress.

BTW, these same details — register caching, processor memory caching, 
interlocked references — are core details around server scaling with 
(or without) multiprocessor cache coherence, and why servers with 
zillions of cores tend not to use cache-coherent designs, and why 
applications and operating systems end up getting reworked to use 
whatever the local server presents for remote memory access — some sort 
of message-passing design, or RDMA (think "memory channel" for that), 
or otherwise.  Alpha got aggressive here to get speed.  But these 
designs don't easily scale; you can end up buried in the overhead of 
keeping all the writes in all the caches from stepping on each other.  
<https://en.wikipedia.org/wiki/Cache_coherence>

Then there's the general discussion of multithreading, which can add a 
whole new class bugs of into application your.

-- 
Pure Personal Opinion | HoffmanLabs LLC