[Info-vax] TCPIP RWAST
Jeffrey H. Coffield
jeffrey at digitalsynergyinc.com
Tue Mar 9 13:14:31 EST 2021
On 03/09/2021 09:05 AM, Stephen Hoffman wrote:
> On 2021-03-09 16:29:43 +0000, Jeffrey H. Coffield said:
>
>> Okay, no responses to my earlier post about TCP/IP issues.
>>
>> Anyone have an idea what can cause a process to go into RWAST on a
>> TCP/IP socket?
>>
>> I believe I have eliminated the sb_max, tcp_sendspace and
>> tcp_recvspace as the culprit. Quotas look okay and non-paged pool is
>> okay.
>>
>> It seems to happen about once a week and a power off/on is the only
>> way to clear it as the system shutdown hangs trying to stop the batch
>> job that is in the RWAST state.
>>
>> Does anyone know how to show TCP/IP connection that are pending?
>>
>> Any clues or suggestions will be appreciated.
>
> You've a resource leak or insufficient quotas, and you're using the
> classic "enterprise app solution" of restarting the app. The classic
> "enterprise app run-time extension" is used to increase the process AST
> limits, of course. Can-kicking, as it's also known.
>
> Somewhere in this app, you're leaking ASTs directly or via associated
> I/O requests or other such. Or the app is getting too busy for its quota
> settings, due to transient spikes in its activity, and getting tangled
> when further operation is paused pending sufficient quota.
>
> Could be failing to clean up sockets or such here, or some other AST
> activity unrelated to networking, or some threshold of quota-permissible
> activity has been reached. Maybe a TCP/IP Services bug, too.
>
> I've also seen app wedges in IP networks and in DECnet networks with
> apps using connection-oriented communications, and where a remote
> receiving app getting wedged, or gets paused within a debugging session,
> or otherwise not draining its pending network traffic queue with
> sufficient expedience. That'll wedge the whole app network, if no
> mechanisms to prevent a back-pressure-induced systemic wedge are
> implemented.
>
> Lacking a tool such as Xcode Instruments on macOS, you're here left to
> monitor the app's outstanding AST counts over time with SDA or SHOW
> PROCESS or app-embedded logging, and to try to isolate which of the many
> app activities might be involved or might be leaking.
>
> Usual longer-term can involve better instrumenting the code and its
> quota usage and network traffic activity, or potentially switching from
> a reliable transport to an unreliable transport, or a combination.
> Selection of an unreliable transport can be particularly advantageous if
> the historical data being transmitted is less useful than is the current
> data.
>
> Also patch your unspecified TCP/IP Services version to V5.7 ECO5F or
> ECO5G. There's seemingly some confusion around which is current there,
> as VSI seems to have ECO5F available and HPE had a saveset known as
> ECO5G. And there have been occasional bugs.
>
>
>
TCPIP V5.7-13ECO5F
@show_quota
Process Quota Information:
Quota Used (pct.) MAX_Used since 9-MAR-2021
06:35:10.14
ASTLM 500 1 0% 1
FILLM 4096 1 0% 1
DIOLM 512 0 0% 0
BIOLM 512 1 0% 1
BYTLM 998272 192 0% 192
ENQLM 4000 0 0% 0
TQLM 400 0 0% 0
PGFLQUOTA 2000000 9424 0% 94242147483647 VIRTUALPAGECNT
Working Set Information:
Max_size
WSEXTENT 1761280 1761280 PQL_DWSEXTENT
WSQUOTA 156688 156686 PQL_DWSQUOTA
WSDEFAULT 78352
SDA> show proc/chan
Process index: 0047 Name: BATCH_830 Extended PID: 00036447
--------------------------------------------------------------------
Process active channels
-----------------------
Channel CCB Window Status Device/file accessed
------- --- ------ ------ --------------------
0010 7FF26000 00000000 RX28C$DKA1:
0020 7FF26020 8C302640
RX28C$DKA1:[IMPACT.EXE]RMS_ORDER_UPLOAD.EXE;20
0040 7FF26060 885E4280
RX28C$DKA0:[VMS$COMMON.SYSLIB]DPML$SHR.EXE;1 (section file)
0050 7FF26080 885ECAC0
RX28C$DKA0:[VMS$COMMON.SYSEXE]DCL.EXE;1 (section file)
00C0 7FF26160 885E3000
RX28C$DKA0:[VMS$COMMON.SYSLIB]LIBOTS2.EXE;1 (section file)
0100 7FF261E0 00000000 Busy BG37540:
0110 7FF26200 885EF9C0
RX28C$DKA0:[VMS$COMMON.SYSMSG]DECC$MSG.EXE;1 (section file)
Total number of open channels : 7.
78343 PQL_DWSDEFAUL
WSSIZE 78352 78352 1761280 WSMAX
PAGES 7152
FAULTS 1297
I have more data if someone wants to see.
This never happened in over 20 years of VAX>Alpha>Itanium until they
moved to a new location so I expect it to be some network setting
somewhere. Most old local non-vms services are now on Amazon.
When it happens, no new tcpip connections of any sort can be made.
I have a few minutes to debug before I reboot so I can probe deeper. I
am currently trying to
Jeff
More information about the Info-vax
mailing list