[Info-vax] x86-64 data aligment / faulting
Arne Vajhøj
arne at vajhoej.dk
Sun Feb 27 20:56:15 EST 2022
On 2/26/2022 11:50 PM, Bob Gezelter wrote:
> On Saturday, February 26, 2022 at 4:36:58 PM UTC-5, Arne Vajhøj wrote:
>> On 2/25/2022 11:37 PM, Bob Gezelter wrote:
>>> It is easy for a benchmark to measure the incorrect phenomenon.
>> There are lies, damn lies and benchmarks.
>>
>> :-)
>>
>> I tested on a 2 MB array.
>>
>> And I admit that the results can be due to many things.
>>
>> But the numbers sure show a big difference!
>>
>> Fortran/VMS/Itanium:
>>
>> OFFSET 0 : 590 ms
>> OFFSET 1 : 197510 ms
>> OFFSET 2 : 197510 ms
>> OFFSET 3 : 197520 ms
>> OFFSET 4 : 197510 ms
>> OFFSET 5 : 197510 ms
>> OFFSET 6 : 197510 ms
>> OFFSET 7 : 197510 ms
>> OFFSET 8 : 590 ms
>> OFFSET 9 : 197510 ms
>> OFFSET 10 : 197520 ms
>> OFFSET 11 : 197520 ms
>> OFFSET 12 : 197520 ms
>> OFFSET 13 : 197520 ms
>> OFFSET 14 : 197520 ms
>> OFFSET 15 : 197520 ms
>> OFFSET 16 : 580 ms
>>
>> GFortran/Windows/x86-64 (100x more reps):
>>
>> OFFSET 0 : 7473 ms
>> OFFSET 1 : 7285 ms
>> OFFSET 2 : 7301 ms
>> OFFSET 3 : 7301 ms
>> OFFSET 4 : 7269 ms
>> OFFSET 5 : 7208 ms
>> OFFSET 6 : 7191 ms
>> OFFSET 7 : 7192 ms
>> OFFSET 8 : 7519 ms
>> OFFSET 9 : 7285 ms
>> OFFSET 10 : 7270 ms
>> OFFSET 11 : 7285 ms
>> OFFSET 12 : 7270 ms
>> OFFSET 13 : 7207 ms
>> OFFSET 14 : 7176 ms
>> OFFSET 15 : 7176 ms
>> OFFSET 16 : 7473 ms
> One needs to analyze beyond raw performance. In this case, I start
> with the cache organization and related structure. If you "break" the
> cache, buy forcing every reference to be a cache miss, one will
> essentially see the 2x performance loss.
>
> If the cache is able to gain anything, it will skew the numbers.
I modified the program to test with different data sizes, verified
that the code was indeed working on an unaligned addresses and
tried both sequential and random access to array.
I simply can't get a big difference between aligned and unaligned access.
Data at the bottom.
I am not saying that there are no cases where unaligned data access
make a significant difference.
But I have not been able to come up with a case.
Arne
DATA SIZE = 2500 ( 20000 BYTES), REP = 100000, SEQUENTIAL ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 717 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 718 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 718 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 717 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 718 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 733 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 717 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 718 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 717 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 718 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 718 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 717 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 733 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 733 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 718 ms
DATA SIZE = 25000 ( 200000 BYTES), REP = 10000, SEQUENTIAL ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 717 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 702 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 702 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 718 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 702 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 702 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 702 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 717 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 702 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 718 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 686 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 718 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 702 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 702 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 702 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 718 ms
DATA SIZE = 250000 ( 2000000 BYTES), REP = 1000, SEQUENTIAL ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 718 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 702 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 717 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 718 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 702 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 717 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 718 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 717 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 702 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 718 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 718 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 702 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 717 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 717 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 718 ms
DATA SIZE = 2500000 (20000000 BYTES), REP = 100, SEQUENTIAL ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 718 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 702 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 717 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 718 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 717 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 718 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 718 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 717 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 718 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 717 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 718 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 702 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 718 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 733 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 717 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 718 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 702 ms
DATA SIZE = 4096 ( 32768 BYTES), REP = 16666, RANDOM ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 733 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 733 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 734 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 733 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 733 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 733 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 733 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 734 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 717 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 733 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 734 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 748 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 734 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 733 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 733 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 733 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 718 ms
DATA SIZE = 32768 ( 262144 BYTES), REP = 1666, RANDOM ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 561 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 593 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 609 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 577 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 593 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 577 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 593 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 593 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 577 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 577 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 608 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 593 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 593 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 577 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 593 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 577 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 577 ms
DATA SIZE = 262144 ( 2097152 BYTES), REP = 166, RANDOM ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 609 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 624 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 639 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 624 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 640 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 624 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 640 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 624 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 608 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 624 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 624 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 639 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 624 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 624 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 640 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 624 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 608 ms
DATA SIZE = 2097152 (16777216 BYTES), REP = 16, RANDOM ACCESS
OFFSET 0 (ADDRESS MOD 8 = 0): 733 ms
OFFSET 1 (ADDRESS MOD 8 = 1): 749 ms
OFFSET 2 (ADDRESS MOD 8 = 2): 749 ms
OFFSET 3 (ADDRESS MOD 8 = 3): 749 ms
OFFSET 4 (ADDRESS MOD 8 = 4): 749 ms
OFFSET 5 (ADDRESS MOD 8 = 5): 748 ms
OFFSET 6 (ADDRESS MOD 8 = 6): 749 ms
OFFSET 7 (ADDRESS MOD 8 = 7): 749 ms
OFFSET 8 (ADDRESS MOD 8 = 0): 733 ms
OFFSET 9 (ADDRESS MOD 8 = 1): 749 ms
OFFSET 10 (ADDRESS MOD 8 = 2): 748 ms
OFFSET 11 (ADDRESS MOD 8 = 3): 749 ms
OFFSET 12 (ADDRESS MOD 8 = 4): 749 ms
OFFSET 13 (ADDRESS MOD 8 = 5): 749 ms
OFFSET 14 (ADDRESS MOD 8 = 6): 749 ms
OFFSET 15 (ADDRESS MOD 8 = 7): 764 ms
OFFSET 16 (ADDRESS MOD 8 = 0): 718 ms
More information about the Info-vax
mailing list