[Info-vax] x86-64 data aligment / faulting

Sat Feb 26 16:36:46 EST 2022

On 2/25/2022 11:37 PM, Bob Gezelter wrote:
> On Friday, February 25, 2022 at 7:12:55 PM UTC-5, Arne Vajhøj wrote:
>> On 2/25/2022 6:57 PM, Mark Daniel wrote:
>>> On 26/2/22 8:23 am, Mark Daniel wrote:
>>>> Alpha and Itanium had data alignment requirements with
>>>> penalties for faulting.  Does x86-64?  Is
>>>> sys$start_align_fault_report() et al. still relevant?
>>> 
>>> Hmmm.  Using an alignment fault generator and reporter I'm seeing
>>> plenty on Alpha and Itanium; zero on x86-64.
>> I had an old Fortran program testing alignment overhead and I just
>> ran it on Windows x86-64 and it showed absolutely no overhead for
>> bad alignment of REAL*8 arrays (and there is a lot of overhead on
>> VMS Alpha and Itanium).
>> 
>> I guess we can say welcome back to CISC. :-)
> 
> With all due respect, the performance penalty for non-aligned
> references is still very real, speaking as one who did a lot of work
> on non-faulting IBM System/370 processors back in the day. The same
> was true with VAX CPUs. They did not fault, but they paid a
> performance penalty.
> 
> There is a difference in context from the days of the System/370 and
> the VAX: multi-level large caches.
> 
> The caches close to the processing core are very fast. This obscures
> the loss of performance due to non-aligned references. Second, all
> loads/stores to/from a cache are, almost by definition, aligned.
> 
> A program designed to produce alignment faults is also very likely to
> not abuse the memory system in a way to detect the mis-aligned data
> penalty. Faults, which are synchronous interrupts, have overhead
> orders of magnitude more than a double memory fetch, particularly
> when sequential elements are referenced (sequential elements may well
> be in the same cache line, even if not aligned on the proper
> boundary).
> 
> If I had the spare time to play with it, I would write a program to
> randomly address a storage area beyond total cache size, so that
> every memory reference is a cache miss. Run aligned and unaligned
> data references and compare the result.
> 
> It is easy for a benchmark to measure the incorrect phenomenon.

There are lies, damn lies and benchmarks.

:-)

I tested on a 2 MB array.

And I admit that the results can be due to many things.

But the numbers sure show a big difference!

Fortran/VMS/Itanium:

OFFSET  0 :    590 ms
OFFSET  1 : 197510 ms
OFFSET  2 : 197510 ms
OFFSET  3 : 197520 ms
OFFSET  4 : 197510 ms
OFFSET  5 : 197510 ms
OFFSET  6 : 197510 ms
OFFSET  7 : 197510 ms
OFFSET  8 :    590 ms
OFFSET  9 : 197510 ms
OFFSET 10 : 197520 ms
OFFSET 11 : 197520 ms
OFFSET 12 : 197520 ms
OFFSET 13 : 197520 ms
OFFSET 14 : 197520 ms
OFFSET 15 : 197520 ms
OFFSET 16 :    580 ms

GFortran/Windows/x86-64 (100x more reps):

  OFFSET  0 :   7473 ms
  OFFSET  1 :   7285 ms
  OFFSET  2 :   7301 ms
  OFFSET  3 :   7301 ms
  OFFSET  4 :   7269 ms
  OFFSET  5 :   7208 ms
  OFFSET  6 :   7191 ms
  OFFSET  7 :   7192 ms
  OFFSET  8 :   7519 ms
  OFFSET  9 :   7285 ms
  OFFSET 10 :   7270 ms
  OFFSET 11 :   7285 ms
  OFFSET 12 :   7270 ms
  OFFSET 13 :   7207 ms
  OFFSET 14 :   7176 ms
  OFFSET 15 :   7176 ms
  OFFSET 16 :   7473 ms

Arne