[Info-vax] Future comparison of optimized VSI x86 compilers vs Linux compilers

Mon Aug 3 16:07:38 EDT 2020

On 2020-08-03 18:34:06 +0000, onewingedshark at gmail.com said:

> Here's a question: how hard would it be to simply [re]write the 
> code-generator for the GEM-compilers to do VMS x86? Yes, I realize 
> there's a lot of hype around LLVM, and it would be nice to get the 
> optimizations, but this should be weighted against what you have now.

Here ignoring and effectively abandoning the existing commitment to and 
the very substantial work underway and that already completed on LLVM 
at VSI...

To what benefit? Re-creating an actively-maintained and widely-used and 
very functional code generator and the rest of the compiler 
infrastructure is a substantial cost (and that even if you're starting 
with a sort-of-working but sort-of-stale x86-32 implementation), and 
involves re-writing other LLVM-associated tooling past the code 
generation and optimization, and the benefits to VSI of that 
substantial effort in GEM-based compiler tooling then only accrue when 
the results are sufficiently better than LLVM to matter. And the costs 
of keeping up that tooling and keeping it performing competitively are 
ongoing.

LLVM also gets you an Arm back-end, meaning that some hypothetical 
future port of VSI OpenVMS to Arm AArch64 servers just got somewhat 
easier.

And LLVM gets VSI access to Clang, a current C and C++ compiler, and 
with a pile of other capabilities. No OpenVMS-isms in Clang or Flang or 
such of course (yet?), but then these and others are also much newer 
compilers than those available on OpenVMS.

And LLVM is modular, meaning hunks of the tooling can be re-used and 
integrated into other packages and tools.  The compiler can be directly 
implemented within an IDE, for instance, allowing the IDE much better 
insight into the language syntax, and into code completion for the 
developer, and means that coding errors can be dynamically displayed 
directly within the IDE as the source code is entered. This continuous 
compilation is significantly past what the LSEDIT COMPILE/REVIEW 
mechanism and related can offer, that as one of the closest examples 
available on OpenVMS. And there are other benefits.

And there are multiple sorta-different-GEM implementations around, just 
to keep this start-over-again-with-GEM idea that much more 
"interesting". Which one?

> Also, since LLVM is/was Low Level Virtual Machine, is it possible to 
> rewrite the code-generators to that they target said VM directly? 
> (There's apparently little relation to the VM from the IR side of 
> things now, according to wikipedia; and I don't really follow LLVM, so 
> I'm not sure if it's applicable here.)

LLVM is a compiler infrastructure, of which includes code generation 
among other features, and of which does not include a hypervisor.

Quoth WP: "The LLVM compiler infrastructure project is a set of 
compiler and toolchain technologies, which can be used to develop a 
front end for any programming language and a back end for any 
instruction set architecture. LLVM is designed around a 
language-independent intermediate representation (IR) that serves as a 
portable, high-level assembly language that can be optimized with a 
variety of transformations over multiple passes.
LLVM is written in C++ and is designed for compile-time, link-time, 
run-time, and "idle-time" optimization. Originally implemented for C 
and C++, the language-agnostic design of LLVM has since spawned a wide 
variety of front ends: languages with compilers that use LLVM include 
ActionScript, Ada, C#, Common Lisp, Crystal, CUDA, D, Delphi, Dylan, 
Fortran, Graphical G Programming Language, Halide, Haskell, Java 
bytecode, Julia, Kotlin, Lua, Objective-C, OpenGL Shading Language, 
PostgreSQL's SQL and PLpgSQL, Ruby, Rust, Scala, Swift, and Xojo."

Again, the "virtual machine" here is in reference to the intermediate 
(and portable) representation that is then used to abstract the 
processing across a variety of different back-end code generators. Not 
to a hypervisor.

> Lastly, what about going with an translation-IR route:
> - Have a GEM backend that produces IR 'objects'.
> - Have these 'objects' with a 'compile' method that produces something 
> appropriately low-level; say BLISS.
> - Update BLISS to be on x86.
> - Take the BLISS-output from the IR, compile for x86.
> - Done.

VSI is using a shim—called the GEM-to-LLVM converter, or some such—to 
glue together the GEM-expecting legacy compiler front-ends with the 
LLVM infrastructure and code generation.

Exactly nobody is going to want to use Bliss (⁉️) as an intermediate 
language. That has various issues, not the least of which are the 
adverse effects of source code debugging. Not everybody wants to debug 
Bliss and the whatever-to-Bliss translation and optimization, nor debug 
the Bliss-to-x86-64 optimization for that matter.

Re-implementing the LLVM IR using Bliss will end badly.

> PS -- Is there any chance for the GEM compilers to be released to 
> open-source, or the documentation/interface(s) released so that the 
> hobbyists could try implementing a direct GEM-to-x86 backend?

GEM source code release? No. For neither the first nor the last time 
this will be discussed here, VSI indicates they have not acquired from 
HPE the rights to open-source the HPE source code.  That includes GEM. 
VSI can hypothetically release their own source code, such as changes 
to LLVM. If VSI chooses. But GEM is not happening without permission 
from HPE.

Learning more about the topic?  For x86-64 and Arm and some other 
platforms, LLVM is a good starting point for writing a compiler, too.

https://github.com/banach-space/llvm-tutor
https://github.com/ghaiklor/llvm-kaleidoscope
etc...

-- 
Pure Personal Opinion | HoffmanLabs LLC