[Info-vax] Future comparison of optimized VSI x86 compilers vs Linux compilers

Fri Jul 31 11:09:03 EDT 2020

In another thread, Terry asked a question and instead of taking the thread further off-topic, I decided to start a new thread.  

On Friday, July 31, 2020 at 4:32:23 AM UTC-4, Terry Kennedy wrote:

> 
> It will be interesting to compare the future production ready release of VSI's x86 port, using optimized versions of compilers, with FreeBSD or Linux on the same hardware. Since they will all be using the LLVM backend, this should provide the most detailed "apple's to apples" performance comparison to date.

Well, a "Red Delicious Apple" to "Fuji Apple" comparison.  

tl;dr Our memory model isn't one that you can select on Linux and our optimization will be a work-in-progress.

The baseline AMD64 Calling Standard used by the Linux box provides "PIC vs noPIC" and "small/medium/large" memory models.  Usually "noPIC/small" out of the box.  We do PIC only.  Fine.  Just use -fPIC on Linux for the comparison.

But our memory model isn't one of the three.  We're mostly "medium" but have a few hints of "large".  For instance, code in 64-bit addresses needs to reach static data in a 32-bit address range.  Since that is more than 2GB away, we have to always go through a GOT entry to get the data.  A Linux application with "small" or "medium" assumes that its static data is within 2GB and will access it directly via a linker generated displacement.  However, a "large" on Linux also means that code can be larger than 2GB and all of the branches and jmps are more complicated.  We limit code to 2GB just like a Linux "small" or "medium" code model.  We also have to support calls to routines in the same module, but in different psects.  LLVM (at least the one we're using with the cross-compilers) assumes that all the routines would be in ".text" and doesn't want to use the GOT for those calls.  With different psects, some might be in 32-bit address ranges and some might be in 64-bit address ranges, such "same module/different section" calls also have to go through the GOT.  Only Linux "large" would give you that.  

LLVM's optimizer relies on metadata called TBAA (Type-Based Alias Analysis).  The LLVM IR and basic type system isn't sufficient to describe all the different language rules for correct optimization for any language.  For example, the clang frontend has to decorate the LLVM IR with C/C++ language aliasing rules.

Our GEM-to-LLVM converter has to process CIL from multiple frontends, make LLVM IR, and produce TBAA metadata.  Can we be as good as clang for our C frontend?  What about Fortran, COBOL, etc.  On real GEM targets, there are callbacks between GEM and the frontends so GEM can ask each frontend questions DURING OPTIMIZATION.  That model doesn't exist in LLVM.  I'm hoping our current design will get us within 90% of the best possible TBAA.

clang for C++ won't have this issue since it will be a clang solution.  That's also one of the reasons why I want to also provide the ability to invoke clang as a C compiler (at least for a subset of C applications that don't use tons of VMS legacy junk - I'm look at you /STAND=VAXC)