[Info-vax] Main cabinet fault indicator on an MSA1000
Rich Jordan
jordan at ccs4vms.com
Wed Aug 22 13:39:36 EDT 2018
On Wednesday, August 22, 2018 at 10:39:27 AM UTC-5, Stephen Hoffman wrote:
> On 2018-08-21 20:26:58 +0000, Rich Jordan said:
>
> > The box is running fine, there are no alarm events showing on either
> > MSA controller, and all other indicators on the controllers, blowers,
> > SAN switches, and interface/enviro boards are greeen/ok/normal, but the
> > little triangle fault light above the power switch is lit. I've run
> > through the CLI checking temperatures, status of all the configs, etc
> > looking for any errors and cannot find one.
> > ...but I haven't found a way to determine what that fault is or clear
> > it, or if it clears some time after the fault condition is corrected.
> > Short of possibly a full restart of the SAN, is there a way to
> > determine the nature of the cabinet fault when all the components are
> > green, or clear it short of a power cycle restart?
>
> The StorageWorks MSA1000 isn't the brightest LED in the storage chandelier.
>
> The LED will obviously light to indicate that a fault has arisen, and
> it'll light when you've not reviewed all of the faults. When you have
> reviewed the faults, the LED will usually only light when you're
> rummaging that part of the error-review menu. Based on what you're
> reporting, this all implies that one or more faults still lurk.
>
> To clear the errors, the CLI is... useless. You'll have to use the
> "game-pad" buttons to the right of the controller display.
>
> You can press the left and right buttons simultaneously when you're in
> the error-review menu, and the specific selected error will be cleared.
> I'd expect this can be used to clear the fault LED. If it doesn't,
> there's probably something (still) wrong with the controller. This
> much is documented.
>
> You can also supposedly clear all of the errors with the use of the
> middle area within the four button "game-pad" to trigger the POST
> power-on self-test. Scroll to the last entry in the error-review menu,
> and press the middle button, and slog through the self-test checks and
> prompts, and one of the self-test options available here is to clear
> all memory. Press the up button to select YES and the right button to
> OK. All errors should be cleared. I've not encountered this
> documented, have not tried this path, and have never tried this on a
> running controller. YMMV, etc.
>
> Related:
> https://community.hpe.com/t5/EVA-Storage/Fault-LED-on-MSA1000/td-p/3134790
> http://h10032.www1.hp.com/ctg/Manual/c00600804.pdf
> http://h20628.www2.hp.com/km-ext/kmcsdirect/emr_na-c00800928-1.pdf
>
>
> --
> Pure Personal Opinion | HoffmanLabs LLC
Hoff
thanks for replying. Its the chassis fault light, not the one on each of the MSA controllers. Both controllers are showing good status after the onsite went through and deleted all the alerts from our testing.
We did dump the eventlog via CLI from both controllers prior to the onsite clearing all the alerts; assuming the eventlogs are the same thing the controllers will display via the buttons, there should be no more faults logged (though I'll check again today to make sure no new ones).
I will pass along the possibility of doing a POST via the buttons when they are at end of day. Thing is that is still the controllers doing POST, so I don't know if it will have any impact on the chassis fault indicator, if it POSTS the SAN switches or EMU or SCSI interfaces, etc. If there's any docs on that one beyond 'it lights up amber when a fault is detected in one or more subsystems' I haven't been able to find it. I won't be surprised it it does take a full power cycle to make it go away.
Hopefully don't lose another piece of kit when we cycle it...
More information about the Info-vax
mailing list