[Info-vax] Emulation Performance

Thu Jan 17 16:58:17 EST 2013

On 2013-01-17 21:11:36 +0000, David Froble said:

> First understand, I know nothing ....
> 
> Consider (as far as I know) what an emulator does.  We throw around the 
> phrase "emulate an Alpha" or whatever.  But is it really emulating an 
> Alpha?  Now I'm guessing, and would enjoy reading any corrections, that 
> the hardware is NOT emulated.  What's happening is that the 
> instructions are emulated to give the same result, from the 
> instructions, that the hardware would give.  But what about Out of 
> Order, pipelines, and some of the rather esoteric stuff done in CPUs.  
> It's my guess none of this is present, so the performance from such 
> features is lost even before getting into the overhead of emulating the 
> instructions.

It's up to the authors of the emulators as to the accuracy of the 
emulation they wish to or need to implement.

For OpenVMS, so long as the emulator complies sufficiently closely to 
the Alpha system reference manual (SRM), and the memory and I/O and 
related all look more or less as expected for the target Alpha 
system(s) being emulated, then OpenVMS will run on it.  Getting there 
is no small effort, though.

As for performance, yes, emulators are slow.

Anybody writing an emulator will be optimizing the snot out of the 
instruction decoder, but you're still looking at a bunch of host 
instructions that will be executed for each Alpha (or VAX or...) 
instruction that gets decoded, plus however much code is needed to 
"execute" the instruction.[1]

More fully emulating — or more correctly emulating or whatever you want 
to call it — a superscalar[2] out-of-order design would require 
multiple cores and some sharing among the cores, or maybe something 
akin to Itanium's instruction bundles and predication.  I'd expect 
emulating out-of-order and superscalar would be less of a benefit than 
an effective JIT 
<http://www.research.ibm.com/trl/projects/jit/index_e.htm>, and a whole 
lot of effort to implement.  With a JIT, getting the code processed 
with fewer instructions than an instruction decoder would require is a 
win, but detecting the hot spots and warming up the JIT adds overhead.  
TANSTAAFL.

An instruction decoder is somewhat analogous to a BASIC interpreter, 
where a JIT is somewhat closer to a BASIC compiler.

Once you get to native instructions and native JIT'd code, then you can 
use the processor-level optimizations that are available in the host 
system hardware.

Related discussions include the JVM, and approaches such as Apple's Rosetta[3].

————
[1] Such as the VAX CRC instruction 
<http://h71000.www7.hp.com/doc/73final/4515/4515pro_026.html#16_cyclicredundancycheckinstruc>, 
if you're emulating a VAX.
[2] <http://en.wikipedia.org/wiki/Superscalar>, which includes a 
picture of a Cray Alpha board.
[3] Rosetta was created by 
<http://thenextweb.com/insider/2011/10/22/how-one-of-apples-most-important-pieces-of-software-came-from-a-small-uk-startup/>. 
 IBM acquired Transitive back in 2008.

-- 
Pure Personal Opinion | HoffmanLabs LLC