[Info-vax] Distributed Applications, Hashgraph, Automation

Thu Feb 22 02:01:50 EST 2018

On Wednesday, February 21, 2018 at 7:20:05 PM UTC-8, Kerry Main wrote:
> > -----Original Message-----
> > From: Info-vax [mailto:info-vax-bounces at rbnsn.com] On Behalf Of Jim
> > Johnson via Info-vax
> > Sent: February 21, 2018 5:20 PM
> > To: info-vax at rbnsn.com
> > Cc: Jim Johnson <jjohnson4250 at comcast.net>
> > Subject: Re: [Info-vax] Distributed Applications, Hashgraph, Automation
> > 
> > On Wednesday, February 21, 2018 at 1:21:18 PM UTC-8, DaveFroble
> > wrote:
> > > Stephen Hoffman wrote:
> > > > On 2018-02-18 21:23:40 +0000, Jim Johnson said:
> > > >
> > > >
> > > >> Fwiw, I spent the last 7 years (until I retired last month) working in
> > > >> the Azure infrastructure.  It gave me a perspective on the cloud,
> > > >> biased by being part of a cloud vendor.  So, this is just my personal
> > > >> view on the cloud - one that I know is partial.
> > > >>
> > > >> The comparison to outsourcing is missing two aspects that are
> > probably
> > > >> most front and center to me.  First, it misses the dynamism about
> > > >> resource acquisition and release.  An outsourcer, or any 'private
> > > >> cloud' (inside company cloud) is not going to be able to provide the
> > > >> ability to have quick peak and valley workloads with equally low
> > > >> cost.  The public clouds can.  That has led to workloads that are
> > > >> inherently transient.  They spin up 100's to 1000's of VMs for a short
> > > >> period, use them, and then give them back.  You need a lot of
> > incoming
> > > >> workloads to amortize that effectively.
> > > >>
> > > >> This also pushes on deployment and configuration agility.  If you're
> > > >> expecting to use 1000 VMs for an hour, but you first have to spend
> > an
> > > >> hour deploying to them, you're not going to be happy.  So that
> > drives
> > > >> deployment times to small numbers of minutes.
> 
> [snip...]
> 
> Jim - lets put this in perspective.
> 
> How many Customers require a bump in 1,000 VM servers for only an hour? 
> 
> What about the associated storage, load balancing, network loads and security (FW's) services associated with spinning up a 1,000 VM's?
> 
> Backups of data? AV scanning of data?
> 
> What about the complexity of data partitioning, App-data routing and data replication (assuming DR is required) across so many VM's for so short a period?
> 
> > 
> > I've tried to cautious in what I say about this, at least partly because I
> > don't know what sort of use current VMS systems have.  I do not want to
> > presume relevance here.
> > 
> > Because of that, let's instead look at what the enablers and drivers are.
> > You'll have to decide if they'd be at all relevant to you.
> > 
> > First, if an application is only a scale-up application, then this discussion
> > isn't relevant.  To get bigger you need a bigger machine, not more
> > machines.  To get smaller, you need a smaller machine, not fewer
> > machines.
> > 
> 
> Agree. Fwiw, this is the issue HPE's HP-UX has and why HPE used to push big Superdomes for scaling up.
> 
> > Second, if it is a scale-out application, then you'll over-provision enough
> > to carry you through any acquisition delays.  If it takes a quarter to get a
> > new batch of machines, then you'll plan to have enough until then.
> > Drops in usage that are smaller than your acquisition delay just don't
> > count in your planning, as you can't respond in that time.
> > 
> 
> Server acquisition times and costs are now a fraction of what they used to be. 
> 
> Vendors also know how to take care of their big Customers. If you are a major Customer of any server vendor, they will ensure they have supplies available locally to ensure they can ship very quickly. 
> 
> > Third, if your cost structure is such that there is no economic benefit to
> > giving back machines that you're not fully using, then you'll not add the
> > complexity to do so.
> > 
> 
> Capacity on demand (CoD) solutions are nothing new. Even OpenVMS had CoD back when the big Alpha Wildfires were shipping over 20 years ago.
> <http://h41379.www4.hpe.com/doc/731final/documentation/pdf/ovms_es47_gs1280_nf_rn.pdf>
> Reference sect 1.18. 
> 
> With OpenVMS X86-64, perhaps this CoD might be resurrected with the new KVM virtualization capabilities being planned for OpenVMS V9.*?
> 
> > So, in a traditional world where you're running on physical machines that
> > you've purchased and installed, there's a lot of bias to running the
> > applications as if they had fixed loads.  You might get some variation
> > between a few applications, such as you'd find between open and after-
> > hours application runs, but overall there was little ROI in trying to closely
> > track load.
> > 
> > Now, let's imagine (and I'm pulling these numbers out of the air for the
> > purposes of discussion) that I make a few changes.  I move to a VM-
> > based workload in the cloud with a more modern configuration
> > management system -- such that I'm not more than, say, 10m from
> > having a new VM with a new instance of my application online and
> > running, and not more than, say, 1m from removing an instance of my
> > application.  Furthermore, I'm now charged by the instance-minute for
> > the resources I consume, so dumping VMs that are not being used
> > heavily enough provides immediate payback.
> > 
> > A lot of interactive workloads suddenly become very interesting,
> > especially those with diurnal patterns in a given region.  Or with monthly
> > or quarterly or yearly peaks.  Or recurring, but non-continuous, analysis
> > workloads.
> > 
> > The management of these can be simplified with autoscaling services
> > that use real time monitoring and rule bases to automatically shut down
> > or create instances based on the current demand.
> > 
> > Fwiw,
> > Jim.
> 
> Perhaps I am a dinosaur, but in the good ole days this was called capacity planning combined with CoD. 
> 
> Having stated this, I would far prefer VSI focus on solving issues for traditional enterprise Customers and not that razor thin upper stratosphere layer of Customers that need 1,000 servers for only an hour.
> 
> 😊
> 
> 
> Regards,
> 
> Kerry Main
> Kerry dot main at starkgaming dot com

Let me start by reiterating that I do not have data on what current VMS customers are doing, so I'm not willing to make a claim about how many would need any particular feature.  My reason for replying here was to clarify what has seemed to be some misunderstandings about what goes on in today's public clouds.

For the questions around the ancillary complexity, there are roughly two answers.  First, for applications of any scale, configuration and deployment are fully automated.  This makes propagating, e.g. firewall rules, very straightforward.  I've honestly not encountered this as an issue.

Second, around the data storage, most applications have externalized their data so that when the number of VMs scale out there is little overhead on the data management (under the covers the storage provider may be doing a lot of work, but it is not unique to the scale out operation).

It may or may not be interesting, but https://azure.microsoft.com/en-us/features/autoscale/ gives a start for using the Azure autoscaling service.  It is far from the only such service, just the one I've seen more up close than the others.

For whether or not short term workloads exist in any volume (as opposed to scaling up and down a long running workload), a place to start is to look to batch systems, such as https://azure.microsoft.com/en-us/services/batch/.  Again, this is far from the only one...

For server acquisition, faster is certainly better, in that you can delay purchasing for longer.  But can you also return hardware to the vendor at the same rate, with sufficient refund, and do both regularly?

Which then brings in the comparison to CoD.  Yes, there is some comparison, and I could even imagine ways that it could be hooked up as a trigger for a scaling operation.  But it also does not address the full lifecycle - the arrival of resources that you'll be charged for having, the use of those resources, and the return of those resources and the termination of charges for them.

I would expect CoD to cover acquiring resources you already own for use by a workload, using it, and then returning it to your pool of available resources.  That's certainly helpful to triggering the more extended lifecycle above, but it, in and of itself, is not the same thing.

Again, I'm not suggesting what VSI's priorities should be.  I'd expect they would have a much better handle on that than I ever work.  

Fwiw, I'd assume that many of the basic items they could do to help as a scalable workload would be common with other requirements - such as improvements in configuration management complexity and time - especially around addition and removal of cluster nodes.

HTH,
Jim.