[Info-vax] Possible resolution of MB issue raised by 5.6.4.3 of 3d ed. AARM

Tue Aug 25 00:41:25 EDT 2015

Here is section 5.6.4.3 of my 3d. edition AARM and the key source
of my reservations regarding IOSB's and synchronization.

See my comments regarding paragraph {3} for a likely resolution of
what I saw as a problem raised within this section.

---------------------------------------------------------------------------

>From AARM, 3d. ed., (c)1998, pp. (I) 5-23 and (I) 5-24 .
Paragraphs of special interest are marked {1} - {4}.

---------------------------------------------------------------------------

5.6.4.3 Multiprocessor Data Stream (Including Single Processor with DMA I/O)

Generally, the only way to reliably communicate shared data is to write the
shared data on one processor or DMA I/O device, execute an MB (or the logical
equivalent[1] if it is a DMA I/O device), then write a flag (equivalently,
send an interrupt) signaling the other processor that the shared data is
ready.  Each receiving processor must read the new flag (equivalently,
receive the interrupt), execute an MB, then read or update the shared data.
In the special case in which data is communicated through just one location
in memory, memory barriers are not necessary.

 Software Note:

  Note that this section does not describe how to reliably communicate
  data from a processor to a DMA device.  See Section 5.6.4.7 .

Leaving out the first MB removes the assurance that shared data is
written before the flag is written.                                    {1}

Removing the second MB removes the assurance that the shared data is read
or updated only after the flag is seen to change; in this case, an early
read could see an old value, and an early update could be overwritten. {2}

This implies that after a DMA I/O device has written some data to memory
(such as paging in a page from disk), the DMA device must logically
exacute an MB[1]  before posting a completion interrupt, and the
interrupt handler software must execute an MB before the data is
guaranteed to be visible to the interrupted processor.  Other processors
must also execute MB's before they are guaranteed to see the new data. {3}

An important special case occurs when a write is done (perhaps by an
I/O device) to some physical page frame, then an MB is executed, and
then a previously invalid PTE is changed to be a valid mapping of the
physical page frame that was just written.  In this case, all
processors that access virtual memory by using the newly valid PTE
must guarantee to deliver the newly written data after the TB miss,
both for I-stream and D-stream accesses.                               {4}

[1] [Footnote - omitted - concerned interpretation for DMA devices including
    DMA devices that do not precisely follow Alpha architecture rules]

---------------------------------------------------------------------------

           Comments on the specially marked ({n}) paragraphs

I believe we can take for our purposes the flag to be the IOSB low order
word or longword and the shared data to be both the rest of the IOSB
and the entire I/O buffer.

{1} This is a definite write ordering issue.

{2} This would be a read ordering issue.  We might ignore it if "early read
    could see old value" meant early read of flag and not data, except that
    paragraph {3} might appear to reinforce the more obvious interpretation.

{3} This seems to address our case precisely.  If the interrupt is
    serviced on processor #1 while user code is executing on processor #2
    then there is no guaranteed MB issued from #2 between a successful
    user level poll of a flag and subsequent access to the shared data.

    [Possible exception: if fork level processing executes in _process_
    context, then the 2nd MB could be expected to occur there. In fact,
    this pretty much resolves all of the issues raised here, requiring
    only that the IOSB update _and_ the 2nd MB be issued at fork level].

{4} This applies to a very special case which we can ignore.

---------------------------------------------------------------------------

Addendum regarding resolution suggested in comments related to paragraph {3}.

Writing OpenVMS I/O Device Drivers in C, (c) 1996, pp 20-21, describes
13 steps in the processing of an I/O request.  The 12th is fork processing
in the driver, which appears to _not_ be in process context.  The 13th,
however, is "The operating system completes the I/O operation", and is
described as "The system I/O postprocessing routines copy the I/O status
into the process address space and return control to the user process".

It is quite likely this occurs in process context and would be the place
where both the IOSB status value change and the 2nd MB instruction could
be issued in order to meet the requirements of 5.6.4.3 .

Finally, I had one additional bit of corroboration,

 https://www.kernel.org/doc/Documentation/memory-barriers.txt

in which it is commented that a special memory barrier is required,
basically meaningless for anything except Alpha, between two memory
reads that have a certain kind of dependency, but upon reading it
more closely I see that it is perhaps a weaker barrier than an MB,
and at any rate should only apply when the writer used a write
barrier (WMB), which is _in_ _fact_ weaker than an MB.  Making the
WMB an MB would most likely make that section moot.

[As an aside, that document basically states that the Alpha is the
most relaxed architecture with respect to ordering issues, so by
basing the Linux kernel memory barrier design on Alpha the other
architectures are essentially covered as well.]

So, a long slog for me but I have to agree that all indications
are that the IOSB status value adequately protects the data buffer
on I/O completion.

George