[Info-vax] VSI OpenVMS Alpha V8.4-2L1 Layered Product Subset Available

Mon Jul 10 22:20:51 EDT 2017

Stephen Hoffman wrote:
> On 2017-07-10 19:22:38 +0000, John Reagan said:
> 
>> On Monday, July 10, 2017 at 2:09:56 PM UTC-4, clair... at vmssoftware.com 
>> wrote:
>>>
>>> Those
> 
> {..."performance improvements of between 15% and 50%"...}
> 
>>> are real numbers from some tests we ran. Here is my simplistic way to 
>>> look at it. Every execlet is 5%-10% smaller and RMS is 15% smaller, 
>>> compared to unoptimized (the standard build). That means many, many 
>>> code paths in the heart of the operating system are shorter with the 
>>> use of byte/word instructions. You can certainly make a more precise 
>>> analysis but that was from my quick look at comparing the result 
>>> disks from the 2L1 and 2L2 builds, picked a bunch of routines and 
>>> looked at the generated code.
>>
>> Not to nitpick, but to avoid confusion.  The standard build is NOT 
>> unoptimized.  The build uses the default of /OPT=LEVEL=4 for all 
>> compiles.  However, the default /ARCH value is EV4 (which also sets 
>> the default for /OPT=TUNE).  We added /ARCH=EV6 to the compilations.  
>> We did not add/remove/change any /OPT qualifiers.
> 
> This approach interspersed duplicated architecture-specific instruction 
> streams throughout the objects and executables, and the compilers added 
> conditional gates to select the appropriate code.  In broad terms, with 
> some code for execution on EV5 and prior, and separate code for EV56 and 
> later.   This approach is conceptually quite simple for the end-users, 
> and certainly more than a little effort involved in the compiler code 
> scheduling and optimizations, but clearly an approach with more than a 
> little run-time overhead involved.  In addition to what Clair and John 
> have mentioned above, this also means that the processor instruction 
> caches are less efficiently populated, and less efficiently used with 
> the branches necessary for interspersed-code design.
> 
> One alternative to this approach that's been discussed occasionally uses 
> so-called fat binaries, where the executable code for the different 
> architectures had different binaries within the same file package, and 
> there was one selection when the code was activated.   (This approach 
> was used on another platform to select code optimized for different 
> targets, and for 32- and 64-bit addressing.)   This approach makes for 
> more complex packaging, uses additional storage, means the compilers and 
> the linker can need multiple passes to build the selected 
> architecture-specific code, and adds some selection logic into the image 
> activator.   This approach also means there's no run-time overhead past 
> the image activator selecting the appropriate binary to activate, and 
> the unnecessary variants can be potentially even be expunged to save on 
> the storage.
> 
> There's no right way to do this, of course.   The trade-offs and the 
> quest for sufficient degree of upward-compatibility are often themselves 
> sources of trouble, complexity and overhead.  Trade-offs between 
> application development efforts and expenses and complexity, and of 
> end-user performance or complexity or suchlike, too.  It's never simple.

As you mention, this isn't easily answered, and no perfect path.  However, so 
far, I haven't heard anyone "sneezing" at 15% to 50% performance improvement.

Still saying "good job VSI".