[Info-vax] hung program location
    VAXman- at SendSpamHere.ORG 
    VAXman- at SendSpamHere.ORG
       
    Tue Feb 19 14:20:18 EST 2013
    
    
  
In article <fd1c789e-3d0f-439a-a9e1-fe6ea1d0ba62 at x13g2000vby.googlegroups.com>, Tom Adams <w.tom.adams at gmail.com> writes:
>On Feb 18, 6:32=A0pm, koeh... at eisner.nospam.encompasserve.org (Bob
>Koehler) wrote:
>> In article <f0482c25-2864-43f1-9590-96dba35d3... at ia3g2000vbb.googlegroups=
>..com>, Tom Adams <w.tom.ad... at gmail.com> writes:
>>
>> > I can see the PC and SP where a program is hung using SHOW PROC/CONT.
>>
>> > How can I relate that to a link map to figure out where the process is
>> > hung in the program that it is running?
>>
>> > The process is running an installed shared image
>>
>> > This is VMS 7.3-2 on an alpha
>>
>> > Thanks.
>>
>> =A0 =A0If you posted the value, it would already give us a hint at how to
>> =A0 =A0help you.
>>
>> =A0 =A0The link map from building the program will contain something near=
> the
>> =A0 =A0PC if it is part of your code. =A0If the address is in system spac=
>e, then
>> =A0 =A0the OS link map will contain it, but you probably don't have a cop=
>y.
>> =A0 =A0There are other ways to find it.
>>
>> =A0 =A0If the address is in a library, the library boundaries could show
>> =A0 =A0up on the link map. =A0Depends on whether you did a full map, IIRC=
>..
>>
>> =A0 =A0Most likely, the PC is in system space, since system routines actu=
>ally
>> =A0 =A0implement program waits. =A0System space values would most likely =
>be
>> =A0 =A0between FFFFFFFF80000000 and FFFFFFFFFFFFFFFF, and certainly above
>> =A0 =A08000000000000000.
>>
>> =A0 =A0If the program state is COM, or is changing, then some of the time
>> =A0 =A0the PC may be pointing to your code. =A0Look for an address below
>> =A0 =A0000000007FFFFFFF.
>>
>> =A0 =A0VMS programs are broken into program sections, and you want to loo=
>k
>> =A0 =A0for your PC in the code program section. =A0Look for $CODE$ on an =
>Alpha
>> =A0 =A0link map.
>>
>> =A0 =A0Most likely the exact value won't be there. =A0If it's your code, =
>there
>> =A0 =A0will be a slightly smaller value, and the difference is the offset
>> =A0 =A0into the routine that starts at that smaller value.
>>
>> =A0 =A0If you compile with a listing file and include the machine code, y=
>ou
>> =A0 =A0can find that offset in your code and relate it to a line of code.
>>
>> =A0 =A0If you did not generate compiler listings and full maps when you
>> =A0 =A0built the code, you can generate them now, if you have exactly the
>> =A0 =A0same source, are using the same versions of compilers, and use exa=
>ctly
>> =A0 =A0the same qualifiers that you did when the program was built, excep=
>t
>> =A0 =A0adding the listing, machine code, map, and full qualifiers.
>
>The PC is 80141918 (as shown on show proc/cont)  The process
>is in HIB when it's at that address.
$ ANALYZE/SYSTEM
SDA> READ/EXEC
SDA> EXAMINE 80141918 -10;20
I'll bet it looks something like this:
SYS$HIBER_C+000B4:      BIS             R31,R31,R25
SYS$HIBER_C+000B8:      LDQ             R27,#X0048(R13)
SYS$HIBER_C+000BC:      LDQ_U           R31,(SP)
SYS$HIBER_C+000C0:      JSR             R26,(R26)
SYS$HIBER_C+000C4:      LDA             R16,#XF04E(R0)
SYS$HIBER_C+000C8:      BEQ             R16,#XFFFFF9
SYS$HIBER_C+000CC:      BR              R31,#XFFFFF0
EXE$HIBER_INT_C:        LDA             SP,#XFFA0(SP)
EXE$HIBER_INT_C+00004:  STQ             R27,(SP)
The 'JSR R26,(R26)' is jumping to the inner handler EXE$HIBER_INT.
The 'LDA R16,#XF04E(R0)' is basically adding -SS$_WAIT_CALLERS_MODE
to the return value in R0 and storing the result in R16.  If it is
SS$_WAIT_CALLERS_MODE, R16 is 0 and the code will loop to wait.
This is not your problem.  You need to figure out WHY you are in a
HIBernate state.  Did the code explicitly invoke a SYS$HIBER or did
something else cause SYS$HIBER to be invoked.
>The code is well controlled in CMS so it's easy produce link maps.
>
>I restarted the hung processes.  This hanging is a rare event that I
>don't know how to reproduce.  But the process does pause at that PC
>in HIB during a normal operation mode.
Then that's normal for it to HIBernate.  Perhaps, the problem then is
an error in the program's wake (SYS$WAKE or SYS$SCHDWK) processing.
-- 
VAXman- A Bored Certified VMS Kernel Mode Hacker    VAXman(at)TMESIS(dot)ORG
Well I speak to machines with the voice of humanity.
    
    
More information about the Info-vax
mailing list