[Info-vax] HP Integrity rx2800 i4 (2.53GHz/32.0MB) :: PAKs won't load

Sun Feb 28 09:20:06 EST 2016

> -----Original Message-----
> From: Info-vax [mailto:info-vax-bounces at info-vax.com] On Behalf Of G
> Cornelius via Info-vax
> Sent: 28-Feb-16 12:38 AM
> To: info-vax at info-vax.com
> Cc: G Cornelius <gcornelius at charter.net>
> Subject: Re: [New Info-vax] HP Integrity rx2800 i4 (2.53GHz/32.0MB) ::
> PAKs won't load
> 
> On 02/16/2016 09:39 PM, Stephen Hoffman wrote:
> > Ayup.  That being part of the utterly absurd implementation of
> > clustering through as many as 20 shared configuration files and
> > variously more, and largely a result of file accretion and not
> > of design, and of compatibility and not of simplicity nor
> > maintainability, and certainly not of ease of use.
> 
> Someone finally admitting this.
> 

Let's be clear - when one does clustering on ANY platform, the complexity
goes up significantly - simply because you have to consider many more
"what-if" scenarios.

Then, add in the "what is we lose a site and we cannot lose any data" 
scenario to the cluster and the complexity goes through the roof - "split 
brain" and many of the issues which ALL OS platforms need to address. 

Ask any HP-UX cluster person about scripting gone wild - think about many
of the embedded cluster functions handled by OpenVMS being the resp of
the assigned installation tech to do the appropriate scripting.

> And it is such a simple problem compared to volume shadowing or to
> clustering itself.

Multi-site clustering is NOT a simple problem for any platform to address
unless the business is ok to lose X amount of data during a significant event
i.e. active-passive design with some buffer time between data sync's.

Check out Keith Parris's many past presentations on DT multi-site clusters.

> 
> Clustering was wonderfully transparent until we went to multisite
> clusters
> and what was effectively a single point of failure: the shared shadow set
> with
> all the configuration files, the one for which the initial mount would be
> problematical because you did not know if you had the other site(s) at
> all,
> and could not be sure _your_ site did not have the stale shadow
> member.
> 

Tough problem on any platform - OpenVMS's current state may not be
the best, but does anyone have a better solution which also ensures 
data stays in sync between volumes on different systems at different 
sites? (please do not state HW mirroring)

> > It works.  When it works.   But it's a train-wreck to deal with.
> > And it's a mess to extend and to update, and a mess for the system
> > manager to upgrade.
> 

Again, clusters - especially multi-site clusters, do require a higher skilled 
person to maintain, but again, that is not unique to OpenVMS.

> Tell me about it.  For me it's been job security in that I'm the only
> one who understands the SLOGICALS.COM that mounts the common
> disk,
> a script that has gotten ever smarter over the years, first trying
> to mount the virtual unit without shadow members specified in hopes
> it is already mounted somewhere, retrying a few times by killing the
> mount subprocess and restarting it if it is hung while allowing
> OPCOM and other interventions from other nodes or even AMDS,
> and ultimately dropping into an interactive subprocess to let the
> operator select from various choices including going into DCL to
> resolve the problem, typically a $ MOUNT/CLUSTER/CONFIRM specifying
> all shadow members.
> 
> Earlier versions knew which node was in what data center and would
> try to wait for maybe two data centers of three present before trying
> the mount.
> 
> Ugly, and if I had specs for IO$_PACKACK, or trusted my little demo
> program that verified I could use physical I/O operations to read
> individual members' shadow control blocks, I could have made it
> almost transparent for the case of all members visible but not
> mounted. [Of course there's never a kernel mode hacker around when
> you need one!]
> 
> Guess it works.  Cluster has been up 18 years since last cluster
> reboot, with a few planned "walking dead" intervals when converting
> to SAN storage, or from one flavor of SAN storage to another.
> 

18 years of cluster uptime with all of those major SAN changes? 

Yeah, that is a really "bad"  record :-)

> Cluster is now just performing a utility function or two plus providing
> access to the tape database and, theoretically, the tape library, given
> that the last production application has gone away, with maintenance
> dropped effective this month. Yep, I'm now system manager without
> portfolio.
> 

Well, it sounds like this was another victim of the DEC/Compaq/HP lack
of attention to its existing base and ISV's, poor marketing .. fill in the 
blanks. If it's any consolation, the same happened to HP-UX and NonStop 
as well ... OpenVMS just happened to be the first one to bear the heat
from DEC/Compaq/HP. 

How do you think HP-UX Cust's feel right now with their only future
being Integrity servers and ONLY and HPE marketing focusing only on 
X86-64 servers?

While it certainly will not be easy and there Is a lot of ground to make 
up for past mistakes, the good news going forward is that there is now
one company (VSI) with a sole focus on readying next generation versions
of OpenVMS.

> > But then I'm being polite.
> 
> Don't sugar coat it, Hoff, tell it like it is!
> 

All of us here on this newsgroup understand the frustrations of the past 
2 decades of DEC/Compaq/HP decisions. With each acquisition, OpenVMS 
became a smaller fish in a much larger pond. 

Add in the 40+ acquisitions of smaller companies (43 when I left in 2012), 
and you can see why over time OpenVMS (HP-UX/NonStop as well) received 
less and less attention from Senior management.

Anyway, water under the bridge ...  one does not design the future by
constantly looking through the rear view mirror.

I agree with Hoff that simplifying OpenVMS mgmt. functions like clustering, 
multiple data repositories, upgrading security etc. should be considered for 
future next gen versions of OpenVMS. 

Having stated this, to the point raised in the last reply, one needs to 
understand that multi-site clustering with no data loss in a DR scenario is 
a really tough nut to crack - on ANY OS platform. 

Regards,

Kerry Main
Kerry dot main at starkgaming dot com