[Info-vax] Home-grown application process dumps

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Mon Jan 5 12:01:01 EST 2015


On 2015-01-05 16:23:40 +0000, RGB said:

> Hi all and happy new year.
> 
> We are currently running a home-grown application, which uses its own 
> "database" on an Itanium rx2800 i2 cluster of 2 nodes running VMS v8.4. 
>  All ECO's are up to date on the cluster.  This home-grown application 
> has various modules which do specific tasks.  Two of the modules have 
> been crashing/dumping as of late and the developers, who wrote the 
> code, claim it's a bug in VMS whereas I believe that the cause of the 
> process dumps are coding issues.  I'm going to output a couple of the 
> process dumps here with the hope that someone could give me their 
> opinion on what might be causing these processes to dump like this.  
> That's my hope anyway!

Any claims of "it's a bug in VMS" is unfortunately immediately suspect, 
without some supporting evidence and/or a reproducer.  While it might 
well be a VMS bug, any programmers involved here should be working to 
isolate the error, and to create a reproducer.   I've learned to 
perform that with misbehavior in my own code, and creating a reproducer 
can and variously does lead to the discovery of the bug somewhere in my 
own code.  If this is a VMS bug, the reproducer is something you can 
hand to HP support, too — that usually makes getting a response and a 
fix from HP Support all that much faster, as they're not wading through 
your code.

The relevant bits in that dump appear to be the following:

%BAS-F-MEMMANVIO, Memory management violation

and

-BAS-I-USEPC_PSL, at user PC=84236620, PSL=0000001B
-SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual 
address=00000000002D0002, PC=FFFFFFFF84236620, PS=0000001B

and

%SYSTEM-F-OPCCUS, opcode reserved to customer fault at 
PC=FFFFFFFF848DBB20, PS=0000001B

On no particular evidence beyond that 00000000002D0002 virtual address 
value looking rather bogus, I'd be looking for some BASIC code 
somewhere that makes a call that passes a string descriptor by value, 
and not by reference.   That particular 002D0002 value is the first 
longword of a two-byte dynamic text string descriptor, after all:

#define DSC$K_DTYPE_T 14                /* Character-coded text. A 
single 8-bit character  */
#define DSC$K_CLASS_D 2                 /* Dynamic String Descriptor        */

Look around for what is located at and at what subroutine calls lead up 
to the execution of the code at virtual address FFFFFFFF84236620, too — 
that's some system space code that's involved, and quite possibly the 
code that's behind a system service call.

If that's not it, I'd start looking for a memory heap corruption, as 
those can blow out all over the place, and with all sorts of odd 
errors.  With BASIC, that's usually some system service call or similar 
that exceeds the size of the string that's been presented to the system 
service call — system services generally don't re-size or extend 
dynamic string descriptors, the calls just keep writing however much 
they've been asked to write.  Ten pounds of bytes into a five-pound 
string buffer makes for a corrupt heap, after all.

It's usually easier to trace these sorts of bugs when the application 
code contains its own integrated logging and tracing support, and 
contains own signal handler and its dump support.  Looking at a process 
dump is slightly tedious and rather further downstream from the error 
and the application code, after all.  Having the ability to trigger the 
debugger via SS$_DEBUG and generate some specific output, and maybe a 
call to the traceback routine, can be helpful, too.

Related:
<http://labs.hoffmanlabs.com/node/803>
<http://labs.hoffmanlabs.com/node/848>
<http://labs.hoffmanlabs.com/node/800>
<http://labs.hoffmanlabs.com/node/800#comment-2049>




-- 
Pure Personal Opinion | HoffmanLabs LLC




More information about the Info-vax mailing list