[Info-vax] Health monitoring disk members of HW RAID controllers?

Wed Aug 10 17:27:47 EDT 2011

On 7/20/2011 10:56 AM, Rod wrote:
> I have been previously using OpenVMS Volume Shadowing (software RAID
> mirroring) with OpenVMS Alpha and OpenVMS/I64.  It was possible to see
> when spindle members of the shadow set were degrading and proactively
> replace disks before it became an emergency repair. (SHO ERR, DECevent
> utility for ERRLOG.SYS content).
>
> I'm now moving towards deploying hardware RAID solutions for the
> enhanced performance they deliver on OpenVMS/Alpha and OpenVMS/I64
> systems.

Another option might be to use the host-based HP RAID Software for 
OpenVMS to provide the performance boost you are looking for. This 
allows RAID 0 arrays (stripesets) to be formed from disks and RAID 0+1 
arrays (stripesets of shadowsets) to be formed from shadowsets, with all 
the control and visibility at the host level that you've come to 
appreciate with HBVS.

As another note on comparative performance, with hardware RAID your 
performance is limited to the performance of a single controller, 
whereas with HBVS and HBR you can shadow and stripe across multiple 
controllers for higher performance than any single controller can 
provide. I also consider such an individual controller to be a potential 
single point of failure. And because the two controllers in a 
dual-redundant pair are intimately connected and have to coordinate with 
each other, for the highest availability configurations I consider even 
such a dual-redundant controller to be a potential single point of 
failure. With HBVS you can shadow across controllers [or pairs] to avoid 
any such potential single points of failure.

Also evaluate how the controllers implement the level of RAID you 
choose. For example, for simplicity of implementation, many backplane HW 
RAID controllers implement mirroring (RAID 1) so as to designate one 
disk as the master for the mirrorset, and all reads come from the master 
member (unless and until it fails). With HBVS you can send reads 
simultaneously to each of the members in parallel for greater 
throughput. HBVS also knows (or can be told) about multi-site 
configurations, and it will send a read to the closest (lowest-latency) 
member for a given node.

> OpenVMS on IA64 and (some) Alphas supports hardware RAID controllers
> which package and offer RAID arrays as a single homogeneous logical
> volumes thru the OS.
>
> As near as I can tell, this hardware RAID logical volume packaging
> means there is no way to monitor the state of heath of the drive
> spindle members that comprise the RAID array using simple means like
> SHO ERR.

Individual HW RAID controller products tend to supply internal error 
logs and counters and such that track errors down at the controller 
level and are often visible using tools at that level.

> I have also looked at the kind of displays available from the OpenVMS
> RAID controller admin utilities (SYS$SYSTEM:MSA$UTIL for the DS15,
> RX2600 SA640x, SYS$SYSTEM:SAS$UTIL for the RX2660 embedded 8 port SAS
> HBA).
>
> Those utilities don't seem to offer any drive spindle-level detail
> displays that would permit advance alert monitoring for degrading
> spindles.
>
> Can someone offer a suggestion of how an OpenVMS system administrator
> could perform such proactive monitoring with "available" tools that
> execute under OpenVMS?  I would like to avoid services/offerings that
> require close co-ordination with HP service. (I have a large installed
> base of remote customer sites that I administer that span many
> different HP service areas or are maintained by a 3rd party vendor).
> I would also like to avoid offline/firmware-based display capabilities
> as they are difficult to access on 24/7 operated remote site nodes.

As you note, these devices are designed to hide the details and 
complexity of the RAID arrays from the host (and hide as much as 
possible any errors and recovery actions as well), so you have to get 
visibility of what's going on down at the controller level itself. This 
may involve things like the HP Storage System Scripting Utility (SSSU) 
for the EVA, for example.