[Info-vax] Health monitoring disk members of HW RAID controllers?
Keith Parris
keithparris_deletethis at yahoo.com
Wed Aug 10 17:27:47 EDT 2011
On 7/20/2011 10:56 AM, Rod wrote:
> I have been previously using OpenVMS Volume Shadowing (software RAID
> mirroring) with OpenVMS Alpha and OpenVMS/I64. It was possible to see
> when spindle members of the shadow set were degrading and proactively
> replace disks before it became an emergency repair. (SHO ERR, DECevent
> utility for ERRLOG.SYS content).
>
> I'm now moving towards deploying hardware RAID solutions for the
> enhanced performance they deliver on OpenVMS/Alpha and OpenVMS/I64
> systems.
Another option might be to use the host-based HP RAID Software for
OpenVMS to provide the performance boost you are looking for. This
allows RAID 0 arrays (stripesets) to be formed from disks and RAID 0+1
arrays (stripesets of shadowsets) to be formed from shadowsets, with all
the control and visibility at the host level that you've come to
appreciate with HBVS.
As another note on comparative performance, with hardware RAID your
performance is limited to the performance of a single controller,
whereas with HBVS and HBR you can shadow and stripe across multiple
controllers for higher performance than any single controller can
provide. I also consider such an individual controller to be a potential
single point of failure. And because the two controllers in a
dual-redundant pair are intimately connected and have to coordinate with
each other, for the highest availability configurations I consider even
such a dual-redundant controller to be a potential single point of
failure. With HBVS you can shadow across controllers [or pairs] to avoid
any such potential single points of failure.
Also evaluate how the controllers implement the level of RAID you
choose. For example, for simplicity of implementation, many backplane HW
RAID controllers implement mirroring (RAID 1) so as to designate one
disk as the master for the mirrorset, and all reads come from the master
member (unless and until it fails). With HBVS you can send reads
simultaneously to each of the members in parallel for greater
throughput. HBVS also knows (or can be told) about multi-site
configurations, and it will send a read to the closest (lowest-latency)
member for a given node.
> OpenVMS on IA64 and (some) Alphas supports hardware RAID controllers
> which package and offer RAID arrays as a single homogeneous logical
> volumes thru the OS.
>
> As near as I can tell, this hardware RAID logical volume packaging
> means there is no way to monitor the state of heath of the drive
> spindle members that comprise the RAID array using simple means like
> SHO ERR.
Individual HW RAID controller products tend to supply internal error
logs and counters and such that track errors down at the controller
level and are often visible using tools at that level.
> I have also looked at the kind of displays available from the OpenVMS
> RAID controller admin utilities (SYS$SYSTEM:MSA$UTIL for the DS15,
> RX2600 SA640x, SYS$SYSTEM:SAS$UTIL for the RX2660 embedded 8 port SAS
> HBA).
>
> Those utilities don't seem to offer any drive spindle-level detail
> displays that would permit advance alert monitoring for degrading
> spindles.
>
> Can someone offer a suggestion of how an OpenVMS system administrator
> could perform such proactive monitoring with "available" tools that
> execute under OpenVMS? I would like to avoid services/offerings that
> require close co-ordination with HP service. (I have a large installed
> base of remote customer sites that I administer that span many
> different HP service areas or are maintained by a 3rd party vendor).
> I would also like to avoid offline/firmware-based display capabilities
> as they are difficult to access on 24/7 operated remote site nodes.
As you note, these devices are designed to hide the details and
complexity of the RAID arrays from the host (and hide as much as
possible any errors and recovery actions as well), so you have to get
visibility of what's going on down at the controller level itself. This
may involve things like the HP Storage System Scripting Utility (SSSU)
for the EVA, for example.
More information about the Info-vax
mailing list