[Info-vax] Future comparison of optimized VSI x86 compilers vs Linux compilers

Wed Aug 5 10:50:04 EDT 2020

On Tuesday, August 4, 2020 at 9:12:03 AM UTC-6, Stephen Hoffman wrote:
> On 2020-08-03 21:25:22 +0000, onewingedshark said:
> 
> > On Monday, August 3, 2020 at 2:07:44 PM UTC-6, Stephen Hoffman wrote:
> >> 
> >> Learning more about the topic?  For x86-64 and Arm and some other> 
> >> platforms, LLVM is a good starting point for writing a compiler, too.
> >> 
> >> https://github.com/banach-space/llvm-tutor
> >> https://github.com/ghaiklor/llvm-kaleidoscope
> >> etc...
> > I'm not really interested in learning LLVM, even though I am working on 
> > a compiler; the simple reason being that I'm not exactly interested in 
> > investing in something only to get software-churned into 'deprecated' 
> > in a year's time. [My compiler is Ada-in-Ada, SPARK where possible. The 
> > ideal goal being a provable compiler, which should be useful in 
> > certification.]
> 
> Which is a very different and far more constrained app target than the 
> requirements and expectations that the VSI folks are necessarily 
> contending with.

Is it?
My plan is a modular construction, but with provability; such that you could write a verified backend and get high-reliability certifications [eg aviation] fairly easily. Bootstrapping, what I'm planning on doing is using a Forth-generating backend so that all you need to do is write the assembly for about 30 Forth-words, and *bam* mow you can run on your new host.

> Expand your self-hosting Ada compiler project out to 
> BASIC, Bliss, C, C++, Macro32, COBOL, Pascal, Fortran, and to tooling 
> including LSEDIT and other IDEs, bespoke linker and debugger and object 
> and executable analysis tooling and librarian and Oracle CDD/Repository 
> support, and the rest of the baggage including the common language 
> environment and hybrid/segmented addressing, and none of which needs to 
> be provable past the applicable test suites, and then make the object 
> and executable code compile and work across Intel and AMD subsets and 
> maybe then with an Arm subset or two, and ponder the effects of those 
> decisions and calculations and the scale of the project on your current 
> approach.

[Note: Don't get to hung up on the example-languages like some did for suggesting BLISS as an intermediate language, I could have suggested IEEE694 or Forth or P-Code, etc.]

I've given some consideration to a multilanguage environment; and I think the approach I'd have for that would be:
(1) have a 'close' source-recoverable IR, for each particular language; akin to DIANA for Ada: https://apps.dtic.mil/sti/pdfs/ADA128232.pdf wherein the IR is structured to disallow invalid source. -- These IRs would be amiable to DB-storage/manipulation such that the architecture described here for CI/Building could be used: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.26.2533&rep=rep1&type=pdf 
(2) Optionally, have a set of "Paradigm" IRs like: Functional-IR [Haskell, Erlang, F#], Procedural-IR, Pattern-matching-IR [SNOBOL, PERL, etc], OOP-IR, Database-IR [PL/SQL, MUMPS, etc] etc.
(3) Have a General IR.
(4) Have a Runtime-description language; ideally one with a metaobject-based system so that you could, say, use Ada TASKs on GPUs or CPUs, dynamically chosen at runtime.
(5) The Backend of the compiler would take #3 & #4 to produce the (executable) output.

What would be nice about having the IRs Database-amiable, is that now you have *FAR* better tooling and control; now you don't search-and-replace "min" with "minimum" and get references to a "minimumute counter" in your comments because you directly manipulate the object's name.

Now, if VHDL were included in that lattice, you could have the backend take a description of the hardware and use that for optimization and code-generation.

So, yeah, I have thought about this problem.

> And compiling to C or assembler or Bliss or whatever else gets ugly for 
> the end-user developers.

I honestly believe that C and Unix have set the industry back decades.
I find it a bit puzzling that IEEE694 [Standard for Microprocessor Assembly Language] never took off.
IMO, the only reason an intermediate of C/BLISS/ASM is so messy for end-users is the text- and file-based nature of the tools like Make and autotools; IMO, the correct method for handling things would be more structured, to borrow Generics from Ada for an oversimple project:

Generic
  Build_For : Architecture; -- An enumeration of architectures the system can handle.
Package PROJECTNAME is
   Executable : Procedure_Pointer renames Main;
End PROJECTNAME;

And then your build-tool would "fill in the holes" of the parameters in "instantiating" the project; pulling from defaults or asking what should be used... and now you don't have the giant mess that you get with .configure and forgetting to set a parameter, or worse doing something like having an extra/missing directory-separator in the parameter and screwing the build up.

> I'd still have a look at LLVM. Even if you should discount the whole 
> project, there are ideas to be had.

LOL -- Maybe I should, though I've looked a a *lot* of stuff over the years, as you can probably tell, and it honestly seems like the best ideas are discarded for a "simple" half-solution that "works"... and then you're stuck with far more [and more complex] work because now you're constrained by the half-/non-solution. (Text-based DIFF is a good example here; it's not exactly uncommon to get a diff of "the whole file changed" because some guy's editor was set to spaces instead of tabs [or vice-versa]... the modern 'solution'? Have everybody use a standard editor+configuration, and/or "coding-styles". The citeseerx link above shows how a DB based system could avoid this, along with providing better version-control [no "the commit broke the build" BS] simply by virtue of its architecture.)