[Info-vax] HP Integrity rx2800 i4 (2.53GHz/32.0MB) :: PAKs won't load

Mon Feb 29 09:28:42 EST 2016

> -----Original Message-----
> From: Info-vax [mailto:info-vax-bounces at info-vax.com] On Behalf Of
> lists--- via Info-vax
> Sent: 29-Feb-16 1:24 AM
> To: info-vax at info-vax.com
> Cc: lists at openmailbox.org
> Subject: Re: [New Info-vax] HP Integrity rx2800 i4 (2.53GHz/32.0MB) ::
> PAKs won't load
> 
> On Sun, 28 Feb 2016 14:20:06 +0000
> Kerry Main via Info-vax <info-vax at rbnsn.com> wrote:
> 
> > Then, add in the "what is we lose a site and we cannot lose any data"
> > scenario to the cluster and the complexity goes through the roof - "split
> > brain" and many of the issues which ALL OS platforms need to address.
> 
> Does that mean "can't lose access to any data" or "can't lose data?"
> 

Can't lose any data - or as is often referred to as RPO=0 (recovery point
objective).

Apologies for not making this clearer, but I was referring to a multi-site 
OpenVMS cluster that uses host based shadowing (sync writes) to ensure 
that data is consistent across both sites. If a write happens at one site, 
then it either completes at both sites or neither site. If there is an abort, 
then the App must take whatever is the correct course of action.

Very bad scenario in any multi-site cluster is a split cluster where a write 
completes at one site, but not the other, yet the App thinks both sites
were updated. 

> If the former, how many setups can really provide that in real time? I
> know
> a system that can do it in near-real time and it will sync up but if you
> lose a system your data on that system is going on be inaccessible. As far
> as "can't lose any data" goes, that sounds like a feature of any OS that
> claims to be enterprise ready. Until an airplane hits the data center.
> 

Again in a multi-site cluster, the airplane hit would be an impact, but 
assuming a properly designed App environment, after a short cluster 
transition, the app would continue (with no data lost) at the other site.

> > Tough problem on any platform - OpenVMS's current state may not be
> > the best, but does anyone have a better solution which also ensures
> > data stays in sync between volumes on different systems at different
> > sites? (please do not state HW mirroring)
> 
> Why not? Even software mirroring has been proven worthy in
> production for a
> long time. By that I mean smart/consistent mirroring as opposed to
> dumb
> mirroring though. Maybe that's what you meant by "HW mirroring."
> 

Again, multi-site cluster using host based shadowing (HBVS).

> > All of us here on this newsgroup understand the frustrations of the
> past
> > 2 decades of DEC/Compaq/HP decisions. With each acquisition,
> OpenVMS
> > became a smaller fish in a much larger pond.
> 
> Well, almost all ;-) I did feel bad when HP messed up their calculator
> division though and they have nobody to blame for that but themselves.
> 
> > Having stated this, to the point raised in the last reply, one needs to
> > understand that multi-site clustering with no data loss in a DR scenario
> > is a really tough nut to crack - on ANY OS platform.
> 
> I'm not sure no data loss in a DR scenario is possible on any platform I
> know of. At first I thought you were talking about a hardware failure. If
> you're talking about airplane hits the building and you can avoid *any*
> data loss that has to be a huge competitive advantage for VMS over
> pretty
> much anything out there. And it ought to be on every single piece of
> marketing collateral for VMS including war stories and links to white
> papers.
> 

Your hired :-)

Yes, OpenVMS multi-site clusters do offer this capability and has proven 
so in many DR scenarios (like 911). Yes, the complexity is quite a bit higher 
to setup and manage, so higher skills are required, but the benefits for 
sites that require this additional insurance, a couple of hours of downtime 
would likely more than pay for the entire second site infrastructure. 

> But if it's only a matter of temporary unavailability, or short term delay
> then I know another OS that can provide that through third party tools
> and I
> guess others could also.
> 

All OS's can provide active-passive type DR solutions (even Windows
albeit coyote ugly).   The issue with active-passive solutions is that one
has to assume that SOME data WILL be lost i.e. RPO not = to 0. This is
because active-passive solutions use some form of SW or HW based
replication which is only sync'd every X or XX minutes. If a significant 
event happens, the App thinks a write is complete, but it has not yet
been committed at the remote site. When the App is restarted at the
remote site, those transactions that were caught in that replication 
buffer window are lost. [Remember that only writes are propagated
in these replication buffers]

If transaction updates are not critical, then it's not a big deal. If the 
transactions are measured in large $'s (banks measure some in 
millions of $'s), then it really is a big deal.

As Keith Parris emphasizes in his DR/DT presentations, there is a huge
difference between a disaster recovery(DR) solution and a disaster
tolerant (DT) solution. With DR, the business is down for some period 
of time and steps are taken to get the business back on line. With DT,
the business can continue to run with no data loss.

I like to compare DR vs DT to insurance - the more you need, the
more it costs.  If you never use it then it is a big expense. If it does
save the company major $'s and/or loss of business / public
credibility, then the additional insurance was a very cheap expense.

Regards,

Kerry Main
Kerry dot main at starkgaming dot com