[Info-vax] TCPIP RWAST

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Tue Mar 9 12:05:40 EST 2021


On 2021-03-09 16:29:43 +0000, Jeffrey H. Coffield said:

> Okay, no responses to my earlier post about TCP/IP issues.
> 
> Anyone have an idea what can cause a process to go into RWAST on a 
> TCP/IP socket?
> 
> I believe I have eliminated the sb_max, tcp_sendspace and tcp_recvspace 
> as the culprit. Quotas look okay and non-paged pool is okay.
> 
> It seems to happen about once a week and a power off/on is the only way 
> to clear it as the system shutdown hangs trying to stop the batch job 
> that is in the RWAST state.
> 
> Does anyone know how to show TCP/IP connection that are pending?
> 
> Any clues or suggestions will be appreciated.

You've a resource leak or insufficient quotas, and you're using the 
classic "enterprise app solution" of restarting the app. The classic 
"enterprise app run-time extension" is used to increase the process AST 
limits, of course. Can-kicking, as it's also known.

Somewhere in this app, you're leaking ASTs directly or via associated 
I/O requests or other such. Or the app is getting too busy for its 
quota settings, due to transient spikes in its activity, and getting 
tangled when further operation is paused pending sufficient quota.

Could be failing to clean up sockets or such here, or some other AST 
activity unrelated to networking, or some threshold of 
quota-permissible activity has been reached. Maybe a TCP/IP Services 
bug, too.

I've also seen app wedges in IP networks and in DECnet networks with 
apps using connection-oriented communications, and where a remote 
receiving app getting wedged, or gets paused within a debugging 
session, or otherwise not draining its pending network traffic queue 
with sufficient expedience. That'll wedge the whole app network, if no 
mechanisms to prevent a back-pressure-induced systemic wedge are 
implemented.

Lacking a tool such as Xcode Instruments on macOS, you're here left to 
monitor the app's outstanding AST counts over time with SDA or SHOW 
PROCESS or app-embedded logging, and to try to isolate which of the 
many app activities might be involved or might be leaking.

Usual longer-term can involve better instrumenting the code and its 
quota usage and network traffic activity, or potentially switching from 
a reliable transport to an unreliable transport, or a combination. 
Selection of an unreliable transport can be particularly advantageous 
if the historical data being transmitted is less useful than is the 
current data.

Also patch your unspecified TCP/IP Services version to V5.7 ECO5F or 
ECO5G. There's seemingly some confusion around which is current there, 
as VSI seems to have ECO5F available and HPE had a saveset known as 
ECO5G. And there have been occasional bugs.



-- 
Pure Personal Opinion | HoffmanLabs LLC 




More information about the Info-vax mailing list