[Info-vax] TCPIP RWAST
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Tue Mar 9 12:05:40 EST 2021
On 2021-03-09 16:29:43 +0000, Jeffrey H. Coffield said:
> Okay, no responses to my earlier post about TCP/IP issues.
>
> Anyone have an idea what can cause a process to go into RWAST on a
> TCP/IP socket?
>
> I believe I have eliminated the sb_max, tcp_sendspace and tcp_recvspace
> as the culprit. Quotas look okay and non-paged pool is okay.
>
> It seems to happen about once a week and a power off/on is the only way
> to clear it as the system shutdown hangs trying to stop the batch job
> that is in the RWAST state.
>
> Does anyone know how to show TCP/IP connection that are pending?
>
> Any clues or suggestions will be appreciated.
You've a resource leak or insufficient quotas, and you're using the
classic "enterprise app solution" of restarting the app. The classic
"enterprise app run-time extension" is used to increase the process AST
limits, of course. Can-kicking, as it's also known.
Somewhere in this app, you're leaking ASTs directly or via associated
I/O requests or other such. Or the app is getting too busy for its
quota settings, due to transient spikes in its activity, and getting
tangled when further operation is paused pending sufficient quota.
Could be failing to clean up sockets or such here, or some other AST
activity unrelated to networking, or some threshold of
quota-permissible activity has been reached. Maybe a TCP/IP Services
bug, too.
I've also seen app wedges in IP networks and in DECnet networks with
apps using connection-oriented communications, and where a remote
receiving app getting wedged, or gets paused within a debugging
session, or otherwise not draining its pending network traffic queue
with sufficient expedience. That'll wedge the whole app network, if no
mechanisms to prevent a back-pressure-induced systemic wedge are
implemented.
Lacking a tool such as Xcode Instruments on macOS, you're here left to
monitor the app's outstanding AST counts over time with SDA or SHOW
PROCESS or app-embedded logging, and to try to isolate which of the
many app activities might be involved or might be leaking.
Usual longer-term can involve better instrumenting the code and its
quota usage and network traffic activity, or potentially switching from
a reliable transport to an unreliable transport, or a combination.
Selection of an unreliable transport can be particularly advantageous
if the historical data being transmitted is less useful than is the
current data.
Also patch your unspecified TCP/IP Services version to V5.7 ECO5F or
ECO5G. There's seemingly some confusion around which is current there,
as VSI seems to have ECO5F available and HPE had a saveset known as
ECO5G. And there have been occasional bugs.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list