[Info-vax] Dvorak on Itanic

Fri Jan 30 13:13:03 EST 2009

On Jan 30, 4:10 pm, Christopher <nadiasver... at gmail.com> wrote:
> > [3]http://web.archive.org/web/20010602154126/www.alphapowered.com/presen...
>
> Many of the arguments in this article are hooey.
>
> "Commercial programs have very low instruction-level parallelism, but
> they are typically explicitly
> multithreaded. Each thread is very sequential and includes long delays
> waiting for
> memory. The IA64 strategy of searching for instruction-level
> parallelism cannot find the
> orders of magnitude improvements available to Alpha through
> simultaneous
> multithreading."
>
> IA-64 processors are completely capable of explicit multi-threading.
> In addition, "automatic" threading is extremely hit and miss.  The
> processor should not be involved in decisions about what to
> multithread because it HAS NO CONTEXT.
>
> They also mention out-of-order execution and dynamic instruction
> parallelism.  Both of these techniques have their own costs.  During
> speculative execution requests are made that *hopefully* will prove
> out.  When they don't they are extremely expensive.
>
> Consider too, all that dynamic speculation hardware takes both power
> and space.  For architectures which have no, or have little,
> instruction set support for static speculation hinting it has some
> value.  However, if you can dedicate all of your silicon to actually
> DOING THE WORK, that is much better.

" if you can dedicate all of your silicon to actually DOING THE WORK,
that is much better."

Undoubtedly a true statement, given the whopping big IF at the front.

EPIC is a winner if your compile-time knowledge of what the CPU
environment will be *when those instructions will be executed* is
better for performance than using the CPU's own runtime knowledge
together with a bit of extra (you said "wasted") silicon which can
often be used but may occasionally sit idle.

This is of course the assertion on which Itanic is predicated, and is
of course (in the market's opinion, based on sales of IA64 vs x86-64)
completely unrealistic in 99.283% of known and unknown computer
environments.

UK natives will know the M25 (the outer London orbital road). Other
countries will have their own equivalent. If you had a map of the M25
and a really fast car (say 200mph) you could plot yourself an
idealised course round the M25, when to brake and when to accelerate
for cornering and hills,  which lanes to use for which junctions, and
stuff like that. And you'd probably average not far off 200mph,
according to the theory.

Now, in reality, the M25 is largely a car park, plus there's a 70mph
speed limit, and in some places the speed limits are lower and not
known till you get there. The predicted best route (the compile time
route) is useless because you have to respond to conditions at the
time you hit them. You could take the 70mph route into consideration
at compile time if you could be bothered, but what you *cannot* know
about are the legal speed limits which vary with the traffic on the
M25, and the impact of events on the M25 which are expected but not
exactly predictable (you know there will be queues, you don't know
exactly where and when, and advance warnings are likely to be
unreliable).

The design of a superfast car which does a sustained 200mph and rarely
manouvers, rarely accelerates, and rarely decelerates, is rather
different from the design of a car designed for the real world. The
real car does indeed have wasted resources (more acceleration needs a
bigger engine, more braking needs heavier brakes) but real world cars
are a more sensible compromise on real worlds than the 200mph
streamlined special.

Similarly, at compile time you cannot completely know about run time
things like cache misses (though maybe you can hint), alignment
problems and other exceptions [how often do you hear that "exceptions
are very expensive on Itanium vs Alpha"], and other stuff which
impacts real application performance, stuff which cannot possibly be
generally predicted at compile time.

Today's 200mph streamlined special Itanium is a big fat cache (Intel
have a good track record with Moore's law, wrt transistor density)
with a somewhat strange processor attached. Imagine if you had that
much cache on an AMD64 with a few Hypertransports dedicated to main
memory interface... well, actually you don't really have to imagine,
it's already clear that with the limited exception of floating point,
the market in general doesn't see any commercially relevant
performance gap [1] between today's top end x86 and today's Itanium.
Will tomorrow's Itanium and tomorrow's x86 change that?

And aren't GPUs [2] (which Intel have historically been clueless at)
the trendy new dedicated performance engine for a variety of
parallelisable FP-intensive apps - at least for those where a
relatively small number of 32bit FPUs are good enough and where 256MB
(today) of per-GPU-board "cache" is enough. You get lots of GPU
floating point processors for the price of a worthwhile number of
Itanium FPUs; GPUs pretty much come on the stationery budget. Software
might be a challenge today but that will change.

[1] Nor do most folks see a commercially significant gap in
reliability, availability, scalability features at chip level, so
unless someone has something new let's not start that one again.
[2] GPU seems to be the trendy new name for the electronics on what we
used to call a "3D graphics card".