[Info-vax] Distributed Applications, Hashgraph, Automation

Sun Feb 18 15:29:06 EST 2018

On 2018-02-18 19:36:04 +0000, Jim Johnson said:

> On Sunday, February 18, 2018 at 11:09:27 AM UTC-8, Stephen Hoffman wrote:
>> On 2018-02-18 17:59:21 +0000, Jim Johnson said:
>> 
>>> Briefly, what I've seen about the cloud has many aspects, only a few 
>>> align with outsourcing.  Not sure how much to go into that.
>>> 
>>> I was following the discussion on shared-everything vs. shared-nothing 
>>> structures.  I've used both.  The RMS cache management was pretty 
>>> expensive to run when I last looked at it.  It was especially bad for 
>>> high write, high collision rate files.  This drove a different approach 
>>> with the TP server in DECdtm.  It is structurally a shared nothing 
>>> service on top of a shared everything access substrate.  It uses an 
>>> unconventional leader election, in that the 'home' system always is the 
>>> leader if it is alive, and the other systems elect one of themselves as 
>>> the leader if it isn't.
>>> 
>>> (This was done via a pattern of use with the DLM, and I agree with 
>>> Steve that either documenting the known patterns or encapsulating them 
>>> for easier consumption could be useful)
>>> 
>>> This produced much better write performance, and good recovery 
>>> availability times.
>> 
>> That experience is typical.  I've ended up splitting more than a few 
>> apps similarly.   Either at the volume level, or within the app.  While 
>> SSDs have helped substantially with I/O performance, the coordination 
>> involved with distributed shared writes ends up limited by how fast you 
>> can fling lock requests around.    The byte-addressable non-volatile 
>> storage that's coming on-line right now will only increase the 
>> coordination load and the likelihood that sharding will be considered 
>> or required, if you really want to use that memory at speed.
> Yup.  It wasn't just the cost of flinging the lock requests around.  
> The old RMS code, at least, propagated writes via the disk, rather than 
> forking the I/O and sending a copy memory-to-memory.  From what I 
> remember, we looked at doing that, but that was complex to get right 
> didn't happen while I involved.  Maybe it has happened since.

That implementation hasn't changed.

Somewhat related to this, writing (shadowing) data from server memory 
to remote server memory is empirically faster than writing to local 
HDD, which can make shadowing from server memory to server memory a 
better choice than to HDD.  Outboard SSD is faster than HDD, but not 
enough.  Add in non-volatile byte-addressable storage and this all gets 
very interesting for even the folks with apps that require non-volatile 
writes.

So many of the existing operating system and app designs are predicated 
on the existing I/O performance hierarchy, too.

Related discussions of OpenVMS I/O performance from David Mathog from a 
number of years ago:

https://groups.google.com/d/msg/comp.os.vms/4FZHjDQ1R4A/DO5xV-z-XGEJ
ftp://saf.bio.caltech.edu/pub/software/benchmarks/mybenchmark.zip

Shared write is hard to get right, and yet harder to scale up, and 
it'll inherently not be competitive with the performance of unshared 
and undistributed writes.

In a number of the OpenVMS-related proposals I've encountered, 
clustering often ends up getting nixed on price.  Folks don't see it as 
enough to warrant the expense and the effort of adopting clustering.  
For folks that are clustered and that are looking at yet higher 
performance end up adding their own workarounds for the OpenVMS and 
clustering limits, if swapping in faster I/O hardware isn't enough.  No 
pun intended.

-- 
Pure Personal Opinion | HoffmanLabs LLC