[Info-vax] Home-grown application process dumps
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Mon Jan 5 12:01:01 EST 2015
On 2015-01-05 16:23:40 +0000, RGB said:
> Hi all and happy new year.
>
> We are currently running a home-grown application, which uses its own
> "database" on an Itanium rx2800 i2 cluster of 2 nodes running VMS v8.4.
> All ECO's are up to date on the cluster. This home-grown application
> has various modules which do specific tasks. Two of the modules have
> been crashing/dumping as of late and the developers, who wrote the
> code, claim it's a bug in VMS whereas I believe that the cause of the
> process dumps are coding issues. I'm going to output a couple of the
> process dumps here with the hope that someone could give me their
> opinion on what might be causing these processes to dump like this.
> That's my hope anyway!
Any claims of "it's a bug in VMS" is unfortunately immediately suspect,
without some supporting evidence and/or a reproducer. While it might
well be a VMS bug, any programmers involved here should be working to
isolate the error, and to create a reproducer. I've learned to
perform that with misbehavior in my own code, and creating a reproducer
can and variously does lead to the discovery of the bug somewhere in my
own code. If this is a VMS bug, the reproducer is something you can
hand to HP support, too — that usually makes getting a response and a
fix from HP Support all that much faster, as they're not wading through
your code.
The relevant bits in that dump appear to be the following:
%BAS-F-MEMMANVIO, Memory management violation
and
-BAS-I-USEPC_PSL, at user PC=84236620, PSL=0000001B
-SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual
address=00000000002D0002, PC=FFFFFFFF84236620, PS=0000001B
and
%SYSTEM-F-OPCCUS, opcode reserved to customer fault at
PC=FFFFFFFF848DBB20, PS=0000001B
On no particular evidence beyond that 00000000002D0002 virtual address
value looking rather bogus, I'd be looking for some BASIC code
somewhere that makes a call that passes a string descriptor by value,
and not by reference. That particular 002D0002 value is the first
longword of a two-byte dynamic text string descriptor, after all:
#define DSC$K_DTYPE_T 14 /* Character-coded text. A
single 8-bit character */
#define DSC$K_CLASS_D 2 /* Dynamic String Descriptor */
Look around for what is located at and at what subroutine calls lead up
to the execution of the code at virtual address FFFFFFFF84236620, too —
that's some system space code that's involved, and quite possibly the
code that's behind a system service call.
If that's not it, I'd start looking for a memory heap corruption, as
those can blow out all over the place, and with all sorts of odd
errors. With BASIC, that's usually some system service call or similar
that exceeds the size of the string that's been presented to the system
service call — system services generally don't re-size or extend
dynamic string descriptors, the calls just keep writing however much
they've been asked to write. Ten pounds of bytes into a five-pound
string buffer makes for a corrupt heap, after all.
It's usually easier to trace these sorts of bugs when the application
code contains its own integrated logging and tracing support, and
contains own signal handler and its dump support. Looking at a process
dump is slightly tedious and rather further downstream from the error
and the application code, after all. Having the ability to trigger the
debugger via SS$_DEBUG and generate some specific output, and maybe a
call to the traceback routine, can be helpful, too.
Related:
<http://labs.hoffmanlabs.com/node/803>
<http://labs.hoffmanlabs.com/node/848>
<http://labs.hoffmanlabs.com/node/800>
<http://labs.hoffmanlabs.com/node/800#comment-2049>
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list