[Info-vax] Future comparison of optimized VSI x86 compilers vs Linux compilers

Mon Aug 3 20:03:42 EDT 2020

On Monday, August 3, 2020 at 2:34:08 PM UTC-4, onewing... at gmail.com wrote:
> On Friday, July 31, 2020 at 9:09:06 AM UTC-6, John Reagan wrote:
> > In another thread, Terry asked a question and instead of taking the thread further off-topic, I decided to start a new thread.  
> > 
> > 
> > On Friday, July 31, 2020 at 4:32:23 AM UTC-4, Terry Kennedy wrote:
> > 
> > > 
> > > It will be interesting to compare the future production ready release of VSI's x86 port, using optimized versions of compilers, with FreeBSD or Linux on the same hardware. Since they will all be using the LLVM backend, this should provide the most detailed "apple's to apples" performance comparison to date.
> > 
> > 
> > Well, a "Red Delicious Apple" to "Fuji Apple" comparison.  
> > 
> > tl;dr Our memory model isn't one that you can select on Linux and our optimization will be a work-in-progress.
> > 
> > The baseline AMD64 Calling Standard used by the Linux box provides "PIC vs noPIC" and "small/medium/large" memory models.  Usually "noPIC/small" out of the box.  We do PIC only.  Fine.  Just use -fPIC on Linux for the comparison.
> > 
> > But our memory model isn't one of the three.  We're mostly "medium" but have a few hints of "large".  For instance, code in 64-bit addresses needs to reach static data in a 32-bit address range.  Since that is more than 2GB away, we have to always go through a GOT entry to get the data.  A Linux application with "small" or "medium" assumes that its static data is within 2GB and will access it directly via a linker generated displacement.  However, a "large" on Linux also means that code can be larger than 2GB and all of the branches and jmps are more complicated.  We limit code to 2GB just like a Linux "small" or "medium" code model.  We also have to support calls to routines in the same module, but in different psects.  LLVM (at least the one we're using with the cross-compilers) assumes that all the routines would be in ".text" and doesn't want to use the GOT for those calls.  With different psects, some might be in 32-bit address ranges and some might be in 64-bit address ranges, such "same module/different section" calls also have to go through the GOT.  Only Linux "large" would give you that.  
> > 
> > LLVM's optimizer relies on metadata called TBAA (Type-Based Alias Analysis).  The LLVM IR and basic type system isn't sufficient to describe all the different language rules for correct optimization for any language.  For example, the clang frontend has to decorate the LLVM IR with C/C++ language aliasing rules.
> > 
> > Our GEM-to-LLVM converter has to process CIL from multiple frontends, make LLVM IR, and produce TBAA metadata.  Can we be as good as clang for our C frontend?  What about Fortran, COBOL, etc.  On real GEM targets, there are callbacks between GEM and the frontends so GEM can ask each frontend questions DURING OPTIMIZATION.  That model doesn't exist in LLVM.  I'm hoping our current design will get us within 90% of the best possible TBAA.
> > 
> > clang for C++ won't have this issue since it will be a clang solution.  That's also one of the reasons why I want to also provide the ability to invoke clang as a C compiler (at least for a subset of C applications that don't use tons of VMS legacy junk - I'm look at you /STAND=VAXC)
> 
> Here's a question: how hard would it be to simply [re]write the code-generator for the GEM-compilers to do VMS x86? Yes, I realize there's a lot of hype around LLVM, and it would be nice to get the optimizations, but this should be weighted against what you have now.
> 
> Also, since LLVM is/was Low Level Virtual Machine, is it possible to rewrite the code-generators to that they target said VM directly? (There's apparently little relation to the VM from the IR side of things now, according to wikipedia; and I don't really follow LLVM, so I'm not sure if it's applicable here.)
> 
> Lastly, what about going with an translation-IR route:
> - Have a GEM backend that produces IR 'objects'.
> - Have these 'objects' with a 'compile' method that produces something appropriately low-level; say BLISS.
> - Update BLISS to be on x86.
> - Take the BLISS-output from the IR, compile for x86.
> - Done.
> 
> PS -- Is there any chance for the GEM compilers to be released to open-source, or the documentation/interface(s) released so that the hobbyists could try implementing a direct GEM-to-x86 backend?

Hoff's post is on the money.

In the past, we had close to 30 people working exclusively on GEM.  That didn't count the frontend or RTL people.  And that was tracking a limited number of hardware variants.  Try to track all the microarchitectures (and their flaws) for x86 and AMD today with the staff VSI has would be impossible.  Plus as Hoff mentioned, LLVM has many targets (over a dozen, want OpenVMS on ARM or System/Z?).  

[If something wants to write a new LLVM backend, Alpha is so regular, it would be straight forward.  Itanium, on the other hand, used to be in LLVM but got dropped years ago due to complexity and lack of support.  I saw that somebody was playing with an 68040 backend but big-endian OpenVMS would be tough.  And, as a joke, somebody has a Z-80 backend.]

GEM was designed around a load/store machine architecture.  While not impossible, teaching GEM about an "addressing-mode rich" architecture with condition codes would be a large effort.  We already don't use many of the Itanium features like advance/speculative loads since the GEM optimizers don't understand such "unsafe" code motions.  And don't me started on NaTs where you can write all 64-bits of a register with a 32-bit store to the bottom and a 32-bit store to the top and still have a NaT.

The Macro compilers also need their own interface.  Neither GEM IL, LLVM IR, or even a BLISS-based model does not let you describe routines that jump between each other.  

And while Hoff mentioned it, I'll say it again: clang and all modern C++ standards.  If I didn't go with clang/LLVM but enhanced GEM, I'd have to not only track all the hardware variants, I'd have to track all the C++ language and STL changes.  No thanks.  I'd need 50 people.  With a clang-based environment, I can hope that open source code can come to OpenVMS with little pain (lets not get into the mixed-pointer discussion here please).

BLISS as an IR would not provide much benefit over the LLVM IR or GEM IR. There is much more than the IR for the "code" portion, there is the debug/DWARF environment, and all the associated tooling that Hoff mentioned.  

Given that Apple and Google are so invested in clang/LLVM for their products, I don't see it going away.

And I don't get your question about somebody else creating a "direct GEM-to-LLVM backend".  As Hoff said, that is exactly what we're doing.  Are you wanting to see the documentation for the GEM IR and GEM symbol table and the other GEM interfaces that a frontend uses (command line processing, file input/output, include file support with knowledge about .TLBs, listing file generation, etc.)  BTW, much of that GEM "shell" code just moves with little effort.  

And for VMS-isms, yes, I need to add clang/flang work.  Clang is already pretty multi-targetable with good support for slipping in alternate "driver" code, etc.  And flang actually has lots of VAX Fortran features in it given that VAX Fortran was the industry standard for a long time.  No, it doesn't have CDD or /DIAG or /ANAL support, but it has lots of the language syntax already.  I've in regular communication with the flang manager.