[Info-vax] TCPIP RWAST

Tue Mar 9 13:14:31 EST 2021

On 03/09/2021 09:05 AM, Stephen Hoffman wrote:
> On 2021-03-09 16:29:43 +0000, Jeffrey H. Coffield said:
>
>> Okay, no responses to my earlier post about TCP/IP issues.
>>
>> Anyone have an idea what can cause a process to go into RWAST on a
>> TCP/IP socket?
>>
>> I believe I have eliminated the sb_max, tcp_sendspace and
>> tcp_recvspace as the culprit. Quotas look okay and non-paged pool is
>> okay.
>>
>> It seems to happen about once a week and a power off/on is the only
>> way to clear it as the system shutdown hangs trying to stop the batch
>> job that is in the RWAST state.
>>
>> Does anyone know how to show TCP/IP connection that are pending?
>>
>> Any clues or suggestions will be appreciated.
>
> You've a resource leak or insufficient quotas, and you're using the
> classic "enterprise app solution" of restarting the app. The classic
> "enterprise app run-time extension" is used to increase the process AST
> limits, of course. Can-kicking, as it's also known.
>
> Somewhere in this app, you're leaking ASTs directly or via associated
> I/O requests or other such. Or the app is getting too busy for its quota
> settings, due to transient spikes in its activity, and getting tangled
> when further operation is paused pending sufficient quota.
>
> Could be failing to clean up sockets or such here, or some other AST
> activity unrelated to networking, or some threshold of quota-permissible
> activity has been reached. Maybe a TCP/IP Services bug, too.
>
> I've also seen app wedges in IP networks and in DECnet networks with
> apps using connection-oriented communications, and where a remote
> receiving app getting wedged, or gets paused within a debugging session,
> or otherwise not draining its pending network traffic queue with
> sufficient expedience. That'll wedge the whole app network, if no
> mechanisms to prevent a back-pressure-induced systemic wedge are
> implemented.
>
> Lacking a tool such as Xcode Instruments on macOS, you're here left to
> monitor the app's outstanding AST counts over time with SDA or SHOW
> PROCESS or app-embedded logging, and to try to isolate which of the many
> app activities might be involved or might be leaking.
>
> Usual longer-term can involve better instrumenting the code and its
> quota usage and network traffic activity, or potentially switching from
> a reliable transport to an unreliable transport, or a combination.
> Selection of an unreliable transport can be particularly advantageous if
> the historical data being transmitted is less useful than is the current
> data.
>
> Also patch your unspecified TCP/IP Services version to V5.7 ECO5F or
> ECO5G. There's seemingly some confusion around which is current there,
> as VSI seems to have ECO5F available and HPE had a saveset known as
> ECO5G. And there have been occasional bugs.
>
>
>

TCPIP V5.7-13ECO5F

@show_quota

Process Quota Information:
              Quota     Used    (pct.)  MAX_Used since  9-MAR-2021 
06:35:10.14
ASTLM          500       1       0%       1
FILLM         4096       1       0%       1
DIOLM          512       0       0%       0
BIOLM          512       1       0%       1
BYTLM       998272     192       0%     192
ENQLM         4000       0       0%       0
TQLM           400       0       0%       0
PGFLQUOTA  2000000    9424       0%    94242147483647  VIRTUALPAGECNT

Working Set Information:
                      Max_size
WSEXTENT   1761280                          1761280  PQL_DWSEXTENT
WSQUOTA     156688                           156686  PQL_DWSQUOTA
WSDEFAULT    78352

SDA> show proc/chan

Process index: 0047   Name: BATCH_830         Extended PID: 00036447
--------------------------------------------------------------------

                             Process active channels
                             -----------------------

Channel    CCB     Window     Status    Device/file accessed
-------    ---     ------     ------    --------------------
   0010  7FF26000  00000000              RX28C$DKA1:
   0020  7FF26020  8C302640 
RX28C$DKA1:[IMPACT.EXE]RMS_ORDER_UPLOAD.EXE;20
   0040  7FF26060  885E4280 
RX28C$DKA0:[VMS$COMMON.SYSLIB]DPML$SHR.EXE;1 (section file)
   0050  7FF26080  885ECAC0 
RX28C$DKA0:[VMS$COMMON.SYSEXE]DCL.EXE;1 (section file)
   00C0  7FF26160  885E3000 
RX28C$DKA0:[VMS$COMMON.SYSLIB]LIBOTS2.EXE;1 (section file)
   0100  7FF261E0  00000000  Busy        BG37540:
   0110  7FF26200  885EF9C0 
RX28C$DKA0:[VMS$COMMON.SYSMSG]DECC$MSG.EXE;1 (section file)

   Total number of open channels : 7.
                    78343  PQL_DWSDEFAUL
WSSIZE       78352   78352                  1761280  WSMAX
PAGES         7152
FAULTS       1297

I have more data if someone wants to see.

This never happened in over 20 years of VAX>Alpha>Itanium until they 
moved to a new location so I expect it to be some network setting 
somewhere. Most old local non-vms services are now on Amazon.

When it happens, no new tcpip connections of any sort can be made.
I have a few minutes to debug before I reboot so I can probe deeper. I 
am currently trying to

Jeff