[Info-vax] Home-grown application process dumps

RGB 11brvo at gmail.com
Mon Jan 5 12:08:19 EST 2015


On Monday, January 5, 2015 12:01:29 PM UTC-5, Stephen Hoffman wrote:
> On 2015-01-05 16:23:40 +0000, RGB said:
> 
> > Hi all and happy new year.
> > 
> > We are currently running a home-grown application, which uses its own 
> > "database" on an Itanium rx2800 i2 cluster of 2 nodes running VMS v8.4. 
> >  All ECO's are up to date on the cluster.  This home-grown application 
> > has various modules which do specific tasks.  Two of the modules have 
> > been crashing/dumping as of late and the developers, who wrote the 
> > code, claim it's a bug in VMS whereas I believe that the cause of the 
> > process dumps are coding issues.  I'm going to output a couple of the 
> > process dumps here with the hope that someone could give me their 
> > opinion on what might be causing these processes to dump like this.  
> > That's my hope anyway!
> 
> Any claims of "it's a bug in VMS" is unfortunately immediately suspect, 
> without some supporting evidence and/or a reproducer.  While it might 
> well be a VMS bug, any programmers involved here should be working to 
> isolate the error, and to create a reproducer.   I've learned to 
> perform that with misbehavior in my own code, and creating a reproducer 
> can and variously does lead to the discovery of the bug somewhere in my 
> own code.  If this is a VMS bug, the reproducer is something you can 
> hand to HP support, too -- that usually makes getting a response and a 
> fix from HP Support all that much faster, as they're not wading through 
> your code.
> 
> The relevant bits in that dump appear to be the following:
> 
> %BAS-F-MEMMANVIO, Memory management violation
> 
> and
> 
> -BAS-I-USEPC_PSL, at user PC=84236620, PSL=0000001B
> -SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual 
> address=00000000002D0002, PC=FFFFFFFF84236620, PS=0000001B
> 
> and
> 
> %SYSTEM-F-OPCCUS, opcode reserved to customer fault at 
> PC=FFFFFFFF848DBB20, PS=0000001B
> 
> On no particular evidence beyond that 00000000002D0002 virtual address 
> value looking rather bogus, I'd be looking for some BASIC code 
> somewhere that makes a call that passes a string descriptor by value, 
> and not by reference.   That particular 002D0002 value is the first 
> longword of a two-byte dynamic text string descriptor, after all:
> 
> #define DSC$K_DTYPE_T 14                /* Character-coded text. A 
> single 8-bit character  */
> #define DSC$K_CLASS_D 2                 /* Dynamic String Descriptor        */
> 
> Look around for what is located at and at what subroutine calls lead up 
> to the execution of the code at virtual address FFFFFFFF84236620, too -- 
> that's some system space code that's involved, and quite possibly the 
> code that's behind a system service call.
> 
> If that's not it, I'd start looking for a memory heap corruption, as 
> those can blow out all over the place, and with all sorts of odd 
> errors.  With BASIC, that's usually some system service call or similar 
> that exceeds the size of the string that's been presented to the system 
> service call -- system services generally don't re-size or extend 
> dynamic string descriptors, the calls just keep writing however much 
> they've been asked to write.  Ten pounds of bytes into a five-pound 
> string buffer makes for a corrupt heap, after all.
> 
> It's usually easier to trace these sorts of bugs when the application 
> code contains its own integrated logging and tracing support, and 
> contains own signal handler and its dump support.  Looking at a process 
> dump is slightly tedious and rather further downstream from the error 
> and the application code, after all.  Having the ability to trigger the 
> debugger via SS$_DEBUG and generate some specific output, and maybe a 
> call to the traceback routine, can be helpful, too.
> 
> Related:
> <http://labs.hoffmanlabs.com/node/803>
> <http://labs.hoffmanlabs.com/node/848>
> <http://labs.hoffmanlabs.com/node/800>
> <http://labs.hoffmanlabs.com/node/800#comment-2049>
> 
> 
> 
> 
> -- 
> Pure Personal Opinion | HoffmanLabs LLC

Hi Steve,

Happy new year to you.  Thanks for your synopsis.  What I find interesting about the above is that these "bugs" can NOT be reproduced in our test/development/QA environments.  Said environments run on exactly the same hardware and config i.e., rx2800 i2 with 32GB RAM and VMS v8.4.  These processes dump ONLY in production but, then again, the modules are more heavily utilized in production than in the aforementioned test/dev environments.



More information about the Info-vax mailing list