[Info-vax] VSI OpenVMS Alpha V8.4-2L1 Layered Product Subset Available

Mon Jul 10 16:02:34 EDT 2017

On 2017-07-10 19:22:38 +0000, John Reagan said:

> On Monday, July 10, 2017 at 2:09:56 PM UTC-4, clair... at vmssoftware.com wrote:
>> 
>> Those

{..."performance improvements of between 15% and 50%"...}

>> are real numbers from some tests we ran. Here is my simplistic way to 
>> look at it. Every execlet is 5%-10% smaller and RMS is 15% smaller, 
>> compared to unoptimized (the standard build). That means many, many 
>> code paths in the heart of the operating system are shorter with the 
>> use of byte/word instructions. You can certainly make a more precise 
>> analysis but that was from my quick look at comparing the result disks 
>> from the 2L1 and 2L2 builds, picked a bunch of routines and looked at 
>> the generated code.
> 
> Not to nitpick, but to avoid confusion.  The standard build is NOT 
> unoptimized.  The build uses the default of /OPT=LEVEL=4 for all 
> compiles.  However, the default /ARCH value is EV4 (which also sets the 
> default for /OPT=TUNE).  We added /ARCH=EV6 to the compilations.  We 
> did not add/remove/change any /OPT qualifiers.

This approach interspersed duplicated architecture-specific instruction 
streams throughout the objects and executables, and the compilers added 
conditional gates to select the appropriate code.  In broad terms, with 
some code for execution on EV5 and prior, and separate code for EV56 
and later.   This approach is conceptually quite simple for the 
end-users, and certainly more than a little effort involved in the 
compiler code scheduling and optimizations, but clearly an approach 
with more than a little run-time overhead involved.  In addition to 
what Clair and John have mentioned above, this also means that the 
processor instruction caches are less efficiently populated, and less 
efficiently used with the branches necessary for interspersed-code 
design.

One alternative to this approach that's been discussed occasionally 
uses so-called fat binaries, where the executable code for the 
different architectures had different binaries within the same file 
package, and there was one selection when the code was activated.   
(This approach was used on another platform to select code optimized 
for different targets, and for 32- and 64-bit addressing.)   This 
approach makes for more complex packaging, uses additional storage, 
means the compilers and the linker can need multiple passes to build 
the selected architecture-specific code, and adds some selection logic 
into the image activator.   This approach also means there's no 
run-time overhead past the image activator selecting the appropriate 
binary to activate, and the unnecessary variants can be potentially 
even be expunged to save on the storage.

There's no right way to do this, of course.   The trade-offs and the 
quest for sufficient degree of upward-compatibility are often 
themselves sources of trouble, complexity and overhead.  Trade-offs 
between application development efforts and expenses and complexity, 
and of end-user performance or complexity or suchlike, too.  It's never 
simple.

-- 
Pure Personal Opinion | HoffmanLabs LLC