[Info-vax] Itanium support is back in GCC 15

Mon Feb 24 14:00:44 EST 2025

On 2/24/2025 12:22 PM, Stephen Hoffman wrote:
> On 2025-02-23 21:29:00 +0000, John Dallman said:
> 
>> In article <vpfoc7$2o5$1 at panix2.panix.com>, kludge at panix.com (Scott 
>> Dorsey) wrote:
>>
>>> The whole idea of the VLIW system is that the compiler will be able 
>>> to optimize the code to gain paralellism of units inside the single 
>>> processor. This is a very very ingenious idea but nobody has yet been 
>>> able to make a compiler that could do it well enough for it to  be a 
>>> real win.
>>
>> Sadly, the job is *impossible*.
>>
>> The fundamental problem in optimisation for modern computers is the 
>> slowness of main RAM, which isn't currently solvable at a reasonable 
>> cost. We use caches to mitigate it.
>>
>> Out-of-order execution addresses this problem by tracking the data 
>> dependencies on memory and registers in real time and executing 
>> instructions when their data is available....
> 
> The Itanium compiler optimizer just doesn't (and can't) know enough 
> about the system memory state, yes. Among other (no pun intended) issues.
> 
> The attempt to address that included providing run-time feedback into 
> the executables; providing post-link, post-execution tuning. (Caliper / 
> Atom / OM / etc.)
> 
> https://www.cs.tufts.edu/comp/150PAT/tools/caliper/wiess-rev-4.pdf
> 
> This Alpha versus IA-64 Itanium paper from 1999 describes the issues 
> with Itanium quite well too, for those interested:
> 
> https://web.archive.org/web/20010611202933/http://www.compaq.com/hpc/ 
> ref/ref_alpha_ia64.doc
> 
> 

Clearly that old Alpha/IA64 comparison was written with an agenda. 
There is no clear attribution in the document but all the "we did" and 
"we designed" clearly indicates authorship in the Alpha hardware group.

Some of their assumptions like it will be impossible to do out-of-order 
on IA64 are wrong since the last Itaniums actually implemented OOO and 
existing images saw an immediate benefit.

They were comparing the Itanium of the day to what they thought Alpha 
could someday do.  The Itanium of the day was pretty bad compared to the 
Alpha of the day (or of the next 2 years).  And it is more than just the 
architecture.  It is the chip, the process, the interface chips, etc.

And yes, it was a challenge for compilers.  The GEM implementation is a 
good V1 but is lacking.  GEM wasn't designed around such a hardware 
model.  I'm sure with additional time/money/people that subsequent 
versions would be better.  Of all the backends, I've seen, the HPUX one 
is the best.  During the Itanium port, I had some of the COBOL RTL 
routines for datatype conversion.  We had C code and the performance was 
horrible out of GEM.  We were considering our own assembly versions, but 
I was directed to some of the HPUX compiler folks.  I gave them the C 
code and in a few weeks, I had Itanium assembly code that I could not 
recognize.  It used all sorts of Itanium features.  It was several times 
faster (I'm thinking 10x but I don't remember).  That code is in the 
COBOL RTL today.  That was on those early Itaniums without OOO.  How 
good would the GEM code be on "modern" Itanium?  Don't know.  Never 
tried.  Doesn't matter.

As you say, cache is king.  Intel doesn't price their chips based on 
clock speed.  They price them based on cache size.

I'll agree that Alpha was the better floating point system.  The weird 
bundling rules in the Itanium architecture make it difficult for a 
floating application.

Not to litigate the argument (but it is what c.o.v does best) again, but 
it was clear to many that upper Digital management didn't want to hear 
technical arguments about the decision.  Turning around to ask your 
choir doesn't give you any information about a transformational change 
in the underlying technology.