[Info-vax] Intel previews new Itanium "Poulson" processor
John Reagan
johnrreagan at earthlink.net
Fri Feb 25 08:41:48 EST 2011
"JF Mezei" <jfmezei.spamnot at vaxination.ca> wrote in message
news:4d66ee58$0$32654$c3e8da3$40d4fd75 at news.astraweb.com...
> John Reagan wrote:
>
>> Poulson, it will now grab 12 slots (4 bundles) every cycle. And of
>> course,
>> branches,
>> calls, returns, etc. make the chip slow down and even spit out a few
>> things.
>
>
> OK, So this is a case of a seqential stream of binary instructions of
> ANY number of instructions between each "stop bit" ?
>
> So I could have a theoretical stream of 25 instructions before a stop
> bit which could be executed in any order.
>
> When running on Tukwilla, it will take the first 6 and execute them, and
> as each "slot" is done, it takes the next instruction available in that
> stream ?
>
> On Poulson, it woudl take the first 12 instructions and proceed to
> process the remaining 13 whenever a slot is freed ?
>
> Is that a correct understanding ?
>
>
> (do IA64 instructions vary in length of execution or do they all execute
> in the same number of cycles ?)
Yes. An intrstruction group (a sequence of slots between stop bits) can be
of any length. 2, 25, etc. The compiler is telling the architecture that
it can execute all of them in parallel if it can. Now as I mentioned, a
compiler might not go out of its way to find that 25 sequence. Long
instruction groups need lots of registers so there won't be a conflict.
That is traded off against the overhead of the 'alloc' at the beginning of a
routine which allocates registers from the register stack. Large sets of
registers probably turn into writing/reading them to the register backing
store. All of the real routines I've ever seen with really long groups were
hand written assembler, not from a compiler. The longest I've seen from a
compiler was around 12 or so.
The length and format of the intructions are all in the downloadable
architecture manuals. Bundles, slots, group, stop bits, it is all there.
As for number of cycles to execute once the chip starts chewing... No a
register to register move or shift will be faster than some fixed multiply
or such. The real painful ones are the moves between integer registers and
floating registers. The integer units and floating units are independent
with different pipeline lengths. Making them sync-up and move data between
the two slow things down. Now, that is true for Alpha as well.
John
More information about the Info-vax
mailing list