[Info-vax] Time to turn DECUServe into a mixed VMScluster?

John Reagan xyzzy1959 at gmail.com
Wed May 31 14:00:11 EDT 2023


On Wednesday, May 31, 2023 at 1:23:02 PM UTC-4, Simon Clubley wrote:
> On 2023-05-31, abrsvc <dansabr... at yahoo.com> wrote: 
> > 
> > Given that is has been stated that the code is not optimized yes, I care nothing about any type of performance data at the moment other than a worksdoesn't work type of test. The performance effort will come later. 
> >
> You should Dan, because there are performance issues and then there 
> are performance issues. 
> 
> It would be nice to get an indication from VSI whether the issues are 
> related to poor compiler code (which should be easily fixable hopefully) 
> or whether it is due to some architecture or design limitation, which is 
> not so easily fixed. 
> 
> Given the problem as described, it could be either one of those two, 
> or a combination of both. I am having a hard time however seeing how poor 
> compiler code by itself could be having such a dramatic impact on this 
> type of kernel code.
> Simon. 
> 
> -- 
> Simon Clubley, clubley at remove_me.eisner.decus.org-Earth.UFP 
> Walking destinations on a map are further away than they appear.
I briefly skimmed the data (there is a lot of it).  And I'm not an expert in this
area of the system.

Compiler optimizers rarely eliminate call frames (other than doing inline
expansion but that doesn't make the work go away).  Call/return overhead
on x86 is very fast (think like JSB/RET on VAX or JSR/RET on Alpha).

Other than the shuffling of PTEs that everybody is focusing on, I'll point out
that x86 does not provide a hardware 'probe' instruction.  If you want to use
'probe' to confirm access to any address, the OS (any OS, not just OpenVMS)
has to crawl around in the page tables to find the final PTE to check the access
flags.  With 4K pages, those PTEs can be 4 or even 5 level deep.

We are very much aware that 'probe' can be pain and deep page table entries.
Using large hardware page sizes can reduce the depth of the page table structures
but might require code inspection.  Reducing or eliminating probes can help as well.
The old theory was to probe always to avoid an exception rarely.  For x86 perhaps
it might be better to skip probes but be ready to handle the rare exceptions?  
An optimizer might "tighten" up the probe code but at the end of the day, there are
lots of pointers to follow and protection masks to extract and compare.  

Another addition to the system (and related to that exe$random_harvest_direct seen
on some of the stack frames) is harvesting entropy at some interval.



More information about the Info-vax mailing list