[Info-vax] Beyond Open Source

Sun May 10 18:26:19 EDT 2015

On 2015-05-10 20:42:19 +0000, johnwallace4 at yahoo.co.uk said:

> On Sunday, 10 May 2015 18:48:53 UTC+1, Stephen Hoffman  wrote:
>> On 2015-05-10 17:28:50 +0000, johnwallace4 at yahoo.co.uk said:
> [huge snippage for brevity]
>> --
>> Pure Personal Opinion | HoffmanLabs LLC
> 
> And how are people supposed to be identifying "the relevant patches", 
> given the limited information provided with them?

The goal is to get folks out of doing that.   It's to automate that.  
Heroic measures — as you've described earlier — isn't how you want a 
server to be managed.

Some patches are known and the selection process is easy — any 
mandatory install, and any optional install where the criteria are met 
— the particular prerequisite software or device or other details.   
For other patches, there'd be a trigger such as a specific failure — 
but again, if you're going to incur a failure or a crash, that's a 
patch you'd probably want to install.

In years past, the patches acquired a poor reputation because they 
introduced errors.  If that misbehavior arises anew — recent UPDATE 
patches have been solid — then there's another and bigger issue lurking 
here, and end-user patch management and patch distribution is not going 
to address it.

> Or are they supposed to take it on trust that everything out of Redmond 
> or  Cupertino (?) is inherently safe and trustworthy?

Cryptographic checksums.   As for patches, poor patch quality spooks 
folks, and delays or defers patches.  Better patch quality means 
quicker roll-outs.  Hopefully the end-users or the partners have a test 
environment, and the tools to match.  As your experience with that 
customer shows, not everybody does, though.

> "we should then design the environment to upload and scan the crashes 
> autonomously, and that we can and should lead the end-users toward the 
> proper outcome for the issues they're encountering."
> 
> Agreed (though we probably mean the end users IT managers ?).

There are an increasing number of end-users managing servers, and that 
trend will not change.   Some folks have the skills, some outsource the 
management or depend on their software supplier, and other folks just 
leave the box to run.

> "It's increasingly common for  applications to avoid user-visible crash 
> logs, but to collect and encrypt and upload that data for analysis."
> 
> Agreed again, subject to a few security caveats.

Opt-in, and using local private key and vendor public key to 
authenticate and to encrypt.

> So what would it take (other than some presentation layer stuff :)) to 
> have VMS combine process dump, system dump (live or post mortem), error 
> logs, etc and maybe even stuff from DECamds and friends, and email it 
> off to the authorised service provider.

A daemon to manage the processing and possibly with the assistance of 
Apache Zookeeper, current crypto and per-server certificates, probably 
an XML or JSON library to structure the data, a variety of data 
collectors for collecting the current patch data and as an alternative 
or replacement for the last-chance dump handler, CLUE CRASH for 
processing dumps, and some other giblets.  Updates to PCSI and/or a 
replacement installer, too.    There's much more to do on the server 
(VSI) end of the connection, as that'll involve processing all that 
data in an automated fashion, as well as determining the entitlements 
or however VSI decides to offer patches.

Since it's VSI software, the uploads will go there for processing by 
default.   It'd be nice to have a linkable framework that allows 
application crashes to be sent elsewhere, though that's obviously 
possible now.

> Maybe the first line service provider needn't even be HP or VSI?

Via VSI partner, most likely.

> This concept has a distinct 1990s deja vu about it... can't give it a 
> name though.

A whole lot of this dates back to then, though better instrumentation 
has become easier and more common.

> It's not going to be top of VSI's priority list, but the infrastructure 
> for gathering the relevant information is already there, architected 
> in, generally available, and frequently used constructively if people 
> can be bothered doing more than "have you tried rebooting it?".

The goal is to avoid the "have you tried rebooting it?" — entirely.

In general....

What is typical at many OpenVMS sites is worse than what DEC had back 
in the 1990s, and what OpenVMS has now is vastly behind what's typical 
and current.

How does VSI keep folks from _this_ era from getting into even deeper 
sneakers than that customer?

How does VSI avoid reading way too many crashdumps?

Having an on-site staff for OpenVMS isn't as common as it once was.  
Dealing with this is not with email-based notifications — not unless 
the end-user or the partner wants email notifications.   For data 
uploads, those are not via email.   Dealing with this best not with 
manual crashdump scanning.  It's increasingly not with displaying 
crashes to end-users, either.

How this best goes forward is with none of what was.  It's automated 
tools and direct connections, probably via HTTPS/443.  Directly 
uploading crash data and automated scanning for known patterns.  It's 
automatically-staged downloads, and push-button patch installs.

It's with time-to-patch being much, much faster than OpenVMS ever was 
before, too.  Vulnerabilities and particularly security vulnerabilities 
will only be exploited more quickly, after all.

It's with configuration and crash and application failure information 
uploaded to VSI and potentially then shared with VSI partners, and 
where VSI or the partners then deliver the support to the end-user 
installs.  Or yes, the end-users that are providing self-maintenance, 
and they can then test the patches and then press the patch button on 
the production servers.

In years past, Canasta/CCAT 
<http://www.decus.de/slides/sy2003/09_04/2k01.pdf> and the old DEC 
proactive services offerings were part of this, but that was not as 
integrated and it wasn't as automated as it should have been.   If 
anything, opt-in collecting of crashes from everybody — support 
contracts or not — makes sense for a variety of reasons.  It gets VSI a 
whole lot of useful data.

-- 
Pure Personal Opinion | HoffmanLabs LLC