[Info-vax] Distributed Applications, Hashgraph, Automation

Sun Feb 18 16:23:40 EST 2018

On Sunday, February 18, 2018 at 12:00:06 PM UTC-8, Kerry Main wrote:
> > -----Original Message-----
> > From: Info-vax [mailto:info-vax-bounces at rbnsn.com] On Behalf Of Jim
> > Johnson via Info-vax
> > Sent: February 18, 2018 12:59 PM
> > To: info-vax at rbnsn.com
> > Cc: Jim Johnson <jjohnson4250 at comcast.net>
> > Subject: Re: [Info-vax] Distributed Applications, Hashgraph, Automation
> > 
> 
> [snip...]
> 
> > 
> > Briefly, what I've seen about the cloud has many aspects, only a few align
> > with outsourcing.  Not sure how much to go into that.
> > 
> 
> Imho - Just as there are many different types of public cloud there are many different types of outsourcing, but the basics are the same.
> 
> Some outsourcers provide a gui based service catalogue that has work flows supporting automation, work flows and approvals etc.
> 
> While many like to think the GUI based "point-click" to create some VM's is cool or new technology, this is really only a GUI based service catalogue which has been part of ITSM for decades.
> 
> In reality there are a number of third party commercial add-ons to VMware that will do this exact thing for internal private clouds aka internal shared services aka the "IT Utility".
> 
> > 
> > I was following the discussion on shared-everything vs. shared-nothing
> > structures.  I've used both.  The RMS cache management was pretty
> > expensive to run when I last looked at it.  It was especially bad for high
> > write, high collision rate files.  This drove a different approach with the TP
> > server in DECdtm.  It is structurally a shared nothing service on top of a
> > shared everything access substrate.  It uses an unconventional leader
> > election, in that the 'home' system always is the leader if it is alive, and
> > the other systems elect one of themselves as the leader if it isn't.
> > 
> > (This was done via a pattern of use with the DLM, and I agree with Steve
> > that either documenting the known patterns or encapsulating them for
> > easier consumption could be useful)
> > 
> > This produced much better write performance, and good recovery
> > availability times.  It allowed RDB to assume it had something like a
> > cluster-wide TM without cross node overheads in the normal case.  At
> > the time I thought this was a good hybrid between the two models.
> > There are aspects that I still think are.  If you have an efficient shared
> > writable volume this is, I think, still a good design.  But I'm very aware
> > that the aspects that matter can also be achieved with either remote
> > storage servers (especially with rdma) or with direct replication (e.g. as
> > you'd find with a Paxos log).
> > 
> > I think, but am not sure, that the Audit Server also used a leader election
> > based shared nothing service on top of a shared volume.  Fwiw.
> > 
> > 
> > Let me add a caveat on the above.  I've been away from VMS for >15
> > years.  All my data on VMS is very old, and is likely very out of date.
> > 
> > Jim.
> 
> Jim - I don't think we ever met while you were at DEC, but I have had numerous "solving the problems of the world" brew sessions with J Apps and M Keyes who speak very highly of you as being one of the industry leaders in file system / TP designs, so your feedback is more than welcome here.
> 
> 😊
> 
> For those not familiar with Jim's past work, check this out:
> <http://www.hpl.hp.com/hpjournal/dtj/vol8num2/vol8num2art1.pdf>
> 
> Btw, from what I understand, the new file system (VAFS?) VSI is working on right now is being designed to address some of the issues you mentioned.
> 
> You may find this interesting:
> <http://www.hp-connect.se/SIG/New_File_System_VMS_Boot%20Camp_2016.pdf>
> 
> Regards,
> 
> Kerry Main
> Kerry dot main at starkgaming dot com

Kerry, thanks much!

I don't think we met, but I recognize your name.  And John & Mick were always way too kind.

The VAFS looks interesting, and I'm glad to see that Andy is associated with it.

LFS structures were pretty nascent at the time of Spiralog, and there were things that we definitely got wrong.  Spiralog was a shot at leadership in the FS space.  I don't want to take credit for that.  I arrived late - the very cool ideas behind it predated me, and that team deserves the credit to pushing to the state of the art as much as they did.

There are two problems around file access in clusters: being able to store the data reliably at scale, and being able to access the data efficiently across the cluster at scale.  Just reading the slides, VAFS looks to help with the first, which is certainly a precondition to much of anything else.  What Steve and I were discussing was about the second - that if you have a write mostly (and 'mostly' can be surprisingly small) workload that can be partitioned, driving that workload as, effectively, shared nothing with an HA store is better.  That is mostly above the file system itself.

Fwiw, I spent the last 7 years (until I retired last month) working in the Azure infrastructure.  It gave me a perspective on the cloud, biased by being part of a cloud vendor.  So, this is just my personal view on the cloud - one that I know is partial.

The comparison to outsourcing is missing two aspects that are probably most front and center to me.  First, it misses the dynamism about resource acquisition and release.  An outsourcer, or any 'private cloud' (inside company cloud) is not going to be able to provide the ability to have quick peak and valley workloads with equally low cost.  The public clouds can.  That has led to workloads that are inherently transient.  They spin up 100's to 1000's of VMs for a short period, use them, and then give them back.  You need a lot of incoming workloads to amortize that effectively.

This also pushes on deployment and configuration agility.  If you're expecting to use 1000 VMs for an hour, but you first have to spend an hour deploying to them, you're not going to be happy.  So that drives deployment times to small numbers of minutes.

This is where batch has gone, from what I can see.  Whether it is Azure Batch, Hadoop, or something else.

But aren't all my workloads basal (i.e. always must be there)?  Maybe today.  I've watched a lot of basal workloads turn into transient workloads as people have understood that there's value in doing so.  It wasn't that they had to be basal, just that it was easier to express if there was no real transient resource capability.  There are indeed basal workloads, but they're typically a smaller subset than people first expect.

The second aspect also has to do with agility.  Again, my understanding is from thinking about software providers.  Every vendor is in a repeated contest with their competitors.  This means that speed of getting from requirement to product in front of the potential customer matters -- i.e. the length of their relative release cycles.  A shorter release cycle matters - it lets you get ahead of your competition, showing features that are more relevant and appearing more responsive (note that this doesn't say that the engineers aren't working on about the same cool features in both places, only that the potential customer is not seeing them for the company with the longer release cycle).

And one aspect of the cycle time is the cost of release.  The higher the cost, the more work that has to go into the release for it to be justified.  For a traditional ('box') product, this is rarely shorter than 6 months.

These are just things that have been true.

The cloud disrupted this in a big way.  The delivery mechanism and the structure of most services (including the incorporation of devops) has driven this cycle time to as low as minutes for higher level features to a few months, worst case, for deep technical changes.  That means that a cloud service competing with a box product is always ahead, and often way ahead, of responding to changing customer requirements.

Note that this is a lot more than just changing the delivery channel.  That is part of it.  But it also requires care on the engineering processes, on the service architecture, on monitoring and telemetry, and inclusion of devops.  For the last, I had a continuous love-hate relationship with devops - I loved the insight it gave me on my customers and how my service actually worked, and hated the 2AM calls. 😃

This is incomplete - it is just two top level thoughts that I had when I was reading.  I honestly don't know how much of this is relevant to VMS.  I'm just sharing my thoughts.  YMMV.  Again, just my personal opinions.

Jim.