[Info-vax] How to deal with FPG process state :-(

Sat Jan 31 02:41:20 EST 2009

Found my alpha to become unresponsive. But from another node, was able
to do SHOW SYS to find a number of processes in the never before seen
state of FPG.

Console had the terrible "page file fragmented attempting to continue"
message.

I tried to STOP/ID the offending IMAP processes to no avail. Tried to
STOP/ID any other non critical process, but it just made them into FPG
statis (except for a couple that stayed in COMO).

Question, with an unresponsive system where only services that don't
require any memory work (SHOW SYSTEM worked, and PINGING the node
worked), is there anything that can be done to recover that system
without the big fat HALT comand from >>> prompt ?

Is this a sign that one of my sysgen parameters needs to be adjusted
(freelim etc ?) to ensure there is alwasy enough memory to be able to
kill a process, or is this a hopeless case when you have runaway sofware
like TCPIP Services which started to create processes left and right
that each go nuts with memory and page file ?

> 0200136 TCPIP$MOUNTD_1  LEF     10     2042   0 00:00:00.67       953     25  N
> 20200137 TCPIP$NTP_1     FPG     10  8787983   0 00:00:54.06      1849     38  N
> 20200138 TCPIP$POP_1     HIB     10    12120   0 00:00:03.41      2974     25  N
> 2020013B SYSLOGD_1       FPG      6   726954   0 00:04:04.32      2243     35  N
> 2020013D WWW server 80   FPG      6 11734083   0 01:02:05.22      4818     39  N
> 20205D47 TCPIP$IMAP_370  HIB     10   929515   0 00:01:54.73     82116     90  N
> 20205F53 DECW$TE_5F53    FPG      6    22937   0 00:00:05.41      1424     28
> 20205F54 _FTA28:         LEFO     4       --  swapped  out  --             32
> 2020035A SYMBIONT_86     FPG      6     4805   0 00:00:08.61      2473     39
> 202049A0 TCPIP$TFT_BG199 LEF     10    14139   0 00:00:01.47       581     29  N
> 20205BA3 DNFS1ACP        FPG     10      231   0 00:00:00.03       187     25
> 20205FA8 _FTA29:         LEFO     6       --  swapped  out  --             30
> 202060AC TCPIP$IMAP_371  FPG     10    33933   0 00:00:12.78     74376   3594  N
> 202060AD TCPIP$IMAP_372  FPG     10    21073   0 00:00:06.79     46001   3190  N
> 202060AE TCPIP$IMAP_373  FPG     10    15720   0 00:00:04.92     32795   3308  N
> 202060AF TCPIP$IMAP_374  FPG     10     9531   0 00:00:03.38     20218   3243  N
> 202060B0 TCPIP$IMAP_375  FPG     10     5719   0 00:00:02.10     13923   3194  N
> 20205AB1 TCPIP$IMAP_376  FPG     10     4015   0 00:00:01.52      5249    849  N
> 20205AB2 TCPIP$IMAP_377  FPG     10     3049   0 00:00:01.21      3407   1401  N
> 202060B3 TCPIP$IMAP_378  FPG     10     1798   0 00:00:00.56      1860    233  N
> 202042CE WWW_SERVE_1     LEF      6       94   0 00:00:00.04       493     25  S
> 202047D4 SYSTEM          COMO    11       --  swapped  out  --             31
> 20201BDE SERVER_0005     FPG      6    15077   0 00:00:09.67     22941     22  N
> 20201ADF SERVER_0004     FPG      6     1334   0 00:00:00.67      2963     22  N

Does anyone know if the limit for those IMAP process would be controlled
by the UAF parameter
	 /maxjobs ?
	 /maxacctjobs ?
	 /maxdetach ?

(Or which parameter would be suggested as best way to prevent IMAP from
creating processes without the old one going away first) ?

Since these processes went nuts with memory and page file, I have to
assume that they were not subprocesses of a single job, so the
/PRCLM parameter (subprocesses) wouldn't seem to be the limiting factor.