[Info-vax] Home-grown application process dumps
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Wed Jan 7 15:22:20 EST 2015
On 2015-01-07 19:47:06 +0000, RGB said:
> The problem here is that we are not getting anything "normal"...
Normal? You're not getting what you're expecting and not getting what
you desire, certainly. If I was unfamiliar with VMS heap corruptions
and with the toxic effects of VMS buffer overruns and assorted
misbehaviors of asynchronous errors, then what happens in these cases
can be very puzzling. Yes, the BASIC environment might not or won't
help with these cases, as this initially appears to be some sort of a
corruption in the heap or the stack, or maybe a synchronization bug.
In all of these cases, the application crash can also potentially arise
long after the error that will trigger the crash, too. In some cases
I've worked, sometimes even days after the trigger.
As for the code that's blowing up, if I pass a bad argument into a
system service call, I'll see the error arise in that code. But the
bug is in my own code. Heap and stack corruptions and synchronization
bugs can show up all over the place. A heap overwrite may not become
visible until some other and entirely unrelated operation encounters
the corruption, for instance.
As for working through this case, I'd do what I'd stated earlier: I'd
review the code for latent coding errors particularly buffer handling
and synchronization, and — particularly if it were my code, and I was
stuck — I'd get somebody else to have a look at the code for anything
I've missed. I'd particularly examine RTL calls and system services
for buffer overruns and heap corruptions, and I'd look for AST-level
problems and for synchronization errors, and for other forms of vermin.
I'd also instrument the code, but I'd generally recommend integrated
logging be implemented for any non-trivial application. I'd instrument
any complex application, even in the absence of errors.
I'd also work toward a reproducer for the error, or at least very
specific details and logged data from what's transpiring here. You
already know what you're getting from the generic process dump
mechanism is not sufficient to locate the trigger, after all.
Instrumenting the code can get me closer to the error, and can help me
gather values. This assuming I don't yet know enough about the trigger
to program the debugger to get to where I can view the error.
Yes, it's possible that there's a VMS error lurking here. But when
presented with these cases, I've learned to always suspect bugs in my
own code first, as VMS has seen a whole lot more use than has my code.
When presented with these sorts of weird cases, handing a specific
error or a reproducer to HP support also means they're not recreating
your environment and walking through all your BASIC code to create one.
I've found a number of these bugs in my code by trying to create that
reproducer, too.
If this has been an on-going issue, then consider getting some help in;
somebody to look at the code and at the application environment. Or if
you're sure it's a VMS bug, contact HP support, of course.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list