[Info-vax] x86-64 data aligment / faulting
Arne Vajhøj
arne at vajhoej.dk
Sat Feb 26 16:36:46 EST 2022
On 2/25/2022 11:37 PM, Bob Gezelter wrote:
> On Friday, February 25, 2022 at 7:12:55 PM UTC-5, Arne Vajhøj wrote:
>> On 2/25/2022 6:57 PM, Mark Daniel wrote:
>>> On 26/2/22 8:23 am, Mark Daniel wrote:
>>>> Alpha and Itanium had data alignment requirements with
>>>> penalties for faulting. Does x86-64? Is
>>>> sys$start_align_fault_report() et al. still relevant?
>>>
>>> Hmmm. Using an alignment fault generator and reporter I'm seeing
>>> plenty on Alpha and Itanium; zero on x86-64.
>> I had an old Fortran program testing alignment overhead and I just
>> ran it on Windows x86-64 and it showed absolutely no overhead for
>> bad alignment of REAL*8 arrays (and there is a lot of overhead on
>> VMS Alpha and Itanium).
>>
>> I guess we can say welcome back to CISC. :-)
>
> With all due respect, the performance penalty for non-aligned
> references is still very real, speaking as one who did a lot of work
> on non-faulting IBM System/370 processors back in the day. The same
> was true with VAX CPUs. They did not fault, but they paid a
> performance penalty.
>
> There is a difference in context from the days of the System/370 and
> the VAX: multi-level large caches.
>
> The caches close to the processing core are very fast. This obscures
> the loss of performance due to non-aligned references. Second, all
> loads/stores to/from a cache are, almost by definition, aligned.
>
> A program designed to produce alignment faults is also very likely to
> not abuse the memory system in a way to detect the mis-aligned data
> penalty. Faults, which are synchronous interrupts, have overhead
> orders of magnitude more than a double memory fetch, particularly
> when sequential elements are referenced (sequential elements may well
> be in the same cache line, even if not aligned on the proper
> boundary).
>
> If I had the spare time to play with it, I would write a program to
> randomly address a storage area beyond total cache size, so that
> every memory reference is a cache miss. Run aligned and unaligned
> data references and compare the result.
>
> It is easy for a benchmark to measure the incorrect phenomenon.
There are lies, damn lies and benchmarks.
:-)
I tested on a 2 MB array.
And I admit that the results can be due to many things.
But the numbers sure show a big difference!
Fortran/VMS/Itanium:
OFFSET 0 : 590 ms
OFFSET 1 : 197510 ms
OFFSET 2 : 197510 ms
OFFSET 3 : 197520 ms
OFFSET 4 : 197510 ms
OFFSET 5 : 197510 ms
OFFSET 6 : 197510 ms
OFFSET 7 : 197510 ms
OFFSET 8 : 590 ms
OFFSET 9 : 197510 ms
OFFSET 10 : 197520 ms
OFFSET 11 : 197520 ms
OFFSET 12 : 197520 ms
OFFSET 13 : 197520 ms
OFFSET 14 : 197520 ms
OFFSET 15 : 197520 ms
OFFSET 16 : 580 ms
GFortran/Windows/x86-64 (100x more reps):
OFFSET 0 : 7473 ms
OFFSET 1 : 7285 ms
OFFSET 2 : 7301 ms
OFFSET 3 : 7301 ms
OFFSET 4 : 7269 ms
OFFSET 5 : 7208 ms
OFFSET 6 : 7191 ms
OFFSET 7 : 7192 ms
OFFSET 8 : 7519 ms
OFFSET 9 : 7285 ms
OFFSET 10 : 7270 ms
OFFSET 11 : 7285 ms
OFFSET 12 : 7270 ms
OFFSET 13 : 7207 ms
OFFSET 14 : 7176 ms
OFFSET 15 : 7176 ms
OFFSET 16 : 7473 ms
Arne
More information about the Info-vax
mailing list