[Info-vax] Itanium support is back in GCC 15
John Reagan
johnrreagan at earthlink.net
Mon Feb 24 17:02:08 EST 2025
On 2/24/2025 4:43 PM, Arne Vajhøj wrote:
> On 2/24/2025 4:22 PM, Michael S wrote:
>> On Mon, 24 Feb 2025 15:08:57 -0500
>> Arne Vajhøj <arne at vajhoej.dk> wrote:
>>> On 2/24/2025 12:42 PM, Michael S wrote:
>
>>> C++ VMS x86-64 is clang which in the (older) clang version used
>>> should mean C++14 while C++ VMS Itanium is very very old (like
>>> C++ 98 old).
>>>
>>>> According to the benchmarks that you posted here several months (a
>>>> year?) ago, VMS x86-64 compilers are quite awful comparatively to
>>>> x86-64 compilers available on Windows/Linux/BSD.
>>>> Do you want to say that VMS Itanium compilers are worse?
>>>
>>> I believe the conclusion was that the VMS x86-64 compilers except C++
>>> was slower than C/C++ on other OS and C++ on VMS.
>>
>> Somehow I got an impression that C++ compilers were also significantly
>> slower than C++ compilers on other platforms.
>> Do I misremember?
>
> I don't even remember that I posted non-VMS numbers here. Age! :-)
>
> But I just checked VMS C++ latest (CXX/OPT=LEVEL:5 and clang -O3) vs a
> random Windows GCC 14.1 (g++ -O3):
>
> VMS is a little faster for integer
> they are about the same for floating point
> Windows is a lot faster for string
>
> And given that this is a micro-benchmark with in reality just an inner
> loop evaluating a single expression, which means huge uncertainty, then
> I don't see this as proof of a significant difference.
>
> Arne
>
We are aware of the string/char performance issues.
On Alpha and Itanium, the lowlevel routines inside of LIBOTS for things
like OTS$MOVE, string compare, memmove, etc. are all written in
hand-crafted assembly. For x86, we are still using a set of BLISS
reference code that is simple. Plus the LIBOTS we all have on our
systems was compiled with a non-optimizing BLISS cross-compiler.
We are currently playing with native compiled LIBOTS code and doing some
benchmarks. Besides the brain-dead BLISS code, we have versions that
loop with larger chunks of data which are even faster. The fastest
we've seen so far is a native assembly version that uses the REP
instruction prefix on the MOVSB. That version didn't check for
overlapping source/dest however so any real version gets a little
slower. I'm not sure when we can incorporate these, but I'm trying to
push them as soon as possible.
A fun reference to read is
https://cdrdv2-public.intel.com/814198/248966-Optimization-Reference-Manual-V1-050.pdf
More information about the Info-vax
mailing list