[Info-vax] Need help tracking down a MUTEX state
Ade
adrian.birkett at blueyonder.co.uk
Thu Aug 20 10:58:32 EDT 2009
"Bob Koehler" <koehler at eisner.nospam.encompasserve.org> wrote in message
news:XWHHrlWF+KL5 at eisner.encompasserve.org...
> In article <a0Wim.359429$jW1.333637 at newsfe22.ams2>, "Ade"
> <adrian.birkett at blueyonder.co.uk> writes:
>> Hi,
>>
>> We have a VAX 6000-610 running VMS 6.1. On occasion the system hangs so
>> I've
>> had the operators crash the system as per instructions found elsewhere.
>>
>> The dump of the system seems fine except for one process in MUTEX state
>> prio=16, mutex count=0 and PCB$L_EFWM=JIB of this process. On the quota
>> front the only 'anomaly' is that the BUFIO byte count/limit is at
>> 192/65216.
>> None of the other usual suspect quotas are diminished.
>>
>> A show proc/chan shows a busy channel to the system disk but no
>> associated
>> file id.
>
> It is possible to scan through the RMS control blocks and find out what
> file is connected to that channel. It's been a long time since I've
> done this and I wouldn't try it myself without the listings.
>
> Having a MUTEX on the JIB could be directly related to the low BUFIO.
> Generally when I see a process low on BUFIO it's doing I/O to a
> mailbox, custom hardware, or network interface that is full or not
> responding. Generally disks don't do that unless they're dieing.
>
> The whole thing makes me wonder if you have the audit log on the
> system disk and are running out of space. VMS' default setup is to
> stop processing until it can get space to continue audits. You
> can change that via SYSGEN parameters, but you really should migrate
> that critical security data offline on a regular basis.
>
> And it's not clear, is the whole system hanging or just one critical
> process or one critical job (process tree)?
>
> SDA will look up the file name for you when used on a live system,
> so you could try running a background job to periodically dump
> that data for the process in question, and look at the last results
> next time you have to force a crash.
>
All,
Thanks for the pointers.
This is the only process on the box that's in a mutex state and the whole
system is hanging. Ironically, this process is a command procedure which
transfers accounting information to a separate location on a different disk.
It consists of an in-house written program which dumps process information
into a user accounting record, then uses accounting/bin/out commands to move
data elsewhere.
Wouldn't it be good if there was a command like
$ dir/id=(n,n,n) disk
...or is there??
I have an example from the web about using the routine which decrements the
byte count value, it seems to imply that the value is held in R1 prior to
the call. Is this correct? If so SDA reports the value 1432 there which
would indeed take the value negative. I suppose I'm used to seeing zero
values in the counts rather than small values.
I would be interested in seeing how to track a channel to a file whose
window address is 00000000. If you find a link please let me know.
Thanks for your help.
Ade
More information about the Info-vax
mailing list