[Info-vax] VSI OpenVMS Alpha V8.4-2L1 Layered Product Subset Available
David Froble
davef at tsoft-inc.com
Mon Jul 10 22:20:51 EDT 2017
Stephen Hoffman wrote:
> On 2017-07-10 19:22:38 +0000, John Reagan said:
>
>> On Monday, July 10, 2017 at 2:09:56 PM UTC-4, clair... at vmssoftware.com
>> wrote:
>>>
>>> Those
>
> {..."performance improvements of between 15% and 50%"...}
>
>>> are real numbers from some tests we ran. Here is my simplistic way to
>>> look at it. Every execlet is 5%-10% smaller and RMS is 15% smaller,
>>> compared to unoptimized (the standard build). That means many, many
>>> code paths in the heart of the operating system are shorter with the
>>> use of byte/word instructions. You can certainly make a more precise
>>> analysis but that was from my quick look at comparing the result
>>> disks from the 2L1 and 2L2 builds, picked a bunch of routines and
>>> looked at the generated code.
>>
>> Not to nitpick, but to avoid confusion. The standard build is NOT
>> unoptimized. The build uses the default of /OPT=LEVEL=4 for all
>> compiles. However, the default /ARCH value is EV4 (which also sets
>> the default for /OPT=TUNE). We added /ARCH=EV6 to the compilations.
>> We did not add/remove/change any /OPT qualifiers.
>
> This approach interspersed duplicated architecture-specific instruction
> streams throughout the objects and executables, and the compilers added
> conditional gates to select the appropriate code. In broad terms, with
> some code for execution on EV5 and prior, and separate code for EV56 and
> later. This approach is conceptually quite simple for the end-users,
> and certainly more than a little effort involved in the compiler code
> scheduling and optimizations, but clearly an approach with more than a
> little run-time overhead involved. In addition to what Clair and John
> have mentioned above, this also means that the processor instruction
> caches are less efficiently populated, and less efficiently used with
> the branches necessary for interspersed-code design.
>
> One alternative to this approach that's been discussed occasionally uses
> so-called fat binaries, where the executable code for the different
> architectures had different binaries within the same file package, and
> there was one selection when the code was activated. (This approach
> was used on another platform to select code optimized for different
> targets, and for 32- and 64-bit addressing.) This approach makes for
> more complex packaging, uses additional storage, means the compilers and
> the linker can need multiple passes to build the selected
> architecture-specific code, and adds some selection logic into the image
> activator. This approach also means there's no run-time overhead past
> the image activator selecting the appropriate binary to activate, and
> the unnecessary variants can be potentially even be expunged to save on
> the storage.
>
> There's no right way to do this, of course. The trade-offs and the
> quest for sufficient degree of upward-compatibility are often themselves
> sources of trouble, complexity and overhead. Trade-offs between
> application development efforts and expenses and complexity, and of
> end-user performance or complexity or suchlike, too. It's never simple.
As you mention, this isn't easily answered, and no perfect path. However, so
far, I haven't heard anyone "sneezing" at 15% to 50% performance improvement.
Still saying "good job VSI".
More information about the Info-vax
mailing list