[Info-vax] VSI OpenVMS Alpha V8.4-2L1 Layered Product Subset Available
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Mon Jul 10 16:02:34 EDT 2017
On 2017-07-10 19:22:38 +0000, John Reagan said:
> On Monday, July 10, 2017 at 2:09:56 PM UTC-4, clair... at vmssoftware.com wrote:
>>
>> Those
{..."performance improvements of between 15% and 50%"...}
>> are real numbers from some tests we ran. Here is my simplistic way to
>> look at it. Every execlet is 5%-10% smaller and RMS is 15% smaller,
>> compared to unoptimized (the standard build). That means many, many
>> code paths in the heart of the operating system are shorter with the
>> use of byte/word instructions. You can certainly make a more precise
>> analysis but that was from my quick look at comparing the result disks
>> from the 2L1 and 2L2 builds, picked a bunch of routines and looked at
>> the generated code.
>
> Not to nitpick, but to avoid confusion. The standard build is NOT
> unoptimized. The build uses the default of /OPT=LEVEL=4 for all
> compiles. However, the default /ARCH value is EV4 (which also sets the
> default for /OPT=TUNE). We added /ARCH=EV6 to the compilations. We
> did not add/remove/change any /OPT qualifiers.
This approach interspersed duplicated architecture-specific instruction
streams throughout the objects and executables, and the compilers added
conditional gates to select the appropriate code. In broad terms, with
some code for execution on EV5 and prior, and separate code for EV56
and later. This approach is conceptually quite simple for the
end-users, and certainly more than a little effort involved in the
compiler code scheduling and optimizations, but clearly an approach
with more than a little run-time overhead involved. In addition to
what Clair and John have mentioned above, this also means that the
processor instruction caches are less efficiently populated, and less
efficiently used with the branches necessary for interspersed-code
design.
One alternative to this approach that's been discussed occasionally
uses so-called fat binaries, where the executable code for the
different architectures had different binaries within the same file
package, and there was one selection when the code was activated.
(This approach was used on another platform to select code optimized
for different targets, and for 32- and 64-bit addressing.) This
approach makes for more complex packaging, uses additional storage,
means the compilers and the linker can need multiple passes to build
the selected architecture-specific code, and adds some selection logic
into the image activator. This approach also means there's no
run-time overhead past the image activator selecting the appropriate
binary to activate, and the unnecessary variants can be potentially
even be expunged to save on the storage.
There's no right way to do this, of course. The trade-offs and the
quest for sufficient degree of upward-compatibility are often
themselves sources of trouble, complexity and overhead. Trade-offs
between application development efforts and expenses and complexity,
and of end-user performance or complexity or suchlike, too. It's never
simple.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list