[Info-vax] Raxco VMS Tuning Seminar Notes

Wed Nov 3 13:43:35 EDT 2010

On Wed, 3 Nov 2010 04:49:53 -0700 (PDT), Neil Rieck
<n.rieck at sympatico.ca> wrote:

>On Nov 1, 12:10 pm, jls <notva... at yahoo.com> wrote:
>
>>
>> Again with the "always".  The drawbacks with this memory management
>> technique are:
>>         1. it uses CPU to constantly manage memory, thus taking
>> available CPU from users  This is a trade-off that is not adequately
>> understood by many people.
>>         2. it only takes memory from ACTIVE users, not idle users.
>> This forces the active users to pagefault much more to get work done.
>>
>
>How could the SWAPPER not trim idle users? They would be faulting at a
>rate of less than PFRATL (if not zero). As I under stand this, you
>would want to trim the working set to a smaller size then swap out
>that smaller set. Not all trimmed pages would ever be faulted back in
>(based upon the theory that you run 10% of the code 90% of the time)

Processes are checked against PFRATH and PFRATL at the end of their
quantum.  Idle processes do not reach an end to quantum, since they
are not using CPU.  Thus, only an active process will have pages taken
away as a result of PFRATL.  This is all pretty well explained in the
VMS Perf Mgt guide.  Taking away pages from an active process means
that the process will have to pagefault those pages back in again.  

In truly bad situations, the processes will thrash pages in and out as
they settle in-between pfrath and pfratl, again utilizing excessive
CPU to constantly flip page list entries.

Basically, using PFRATL to reclaim pages from working sets increase
CPU overhead for memory management pretty significantly.  This
consumes CPU way more than having a "hitman" type product run every 5
minutes or so to knock out idle processes.

>
>You are correct that some CPU resources are used to do this, but this
>could be reduced by increasing AWSTIME to 1, or more, multiples of
>QUANTUM. Anyway, having any program scan the system on a periodic
>basis (watchdog, hitman, etc.) can take its toll.
>

Honestly, I'm not sure where to start with this, there are so many VMS
Perf Mgt misconceptions here.

For the discussion that follows, remember that we are talking about a
memory-constrained configuration.

Only the part of CPU that is involved in doing memory management
applies to this discussion.  Neither of those two paramaters will
improve CPU use for memory management.

If my app requres 20MB per active user to run, but I only give them
10MB, then they will pagefault excessively.  The more users on the
system doing this, the higher likelihood that there will be
hardfaults.  And each hardfault I/O only brings in one page at a time.
And again, this is all using CPU to constantly page stuff in and out
of process working sets.

If, instead, I recognize that only a portion of the user processes are
active at any point in time - say 10% for example - then that means
that of the remaining 90% I probably have some viable candidates for
swapping (i.e., the probability is low that they will become active
"soon").

Swapping out their entire working set (assume that swpoutpgcnt is set
to enable swapping out their entire 20MB working set) will move them
out and make that memory available for active users.  Once an active
user reaches his/her needed 20MB, that person won't pagefault anymore
and will then get the most work done in each quantum.  Assuming that
you have enough memory for all of the active processes on the system,
the CPU will no longer work hard to keep making pages available to
them, so more is available to do real work.

And remember, swapping a process in brings in the entire working set
in one I/O, though possibly at the expensive of swapping out another
process - though the swapped out process should be an idle one.

Other processes will presumably become idle and valid candidates for
swapping out to make memory available for active users.

If you do not set swpoutpgcnt high enough, then many pages will be
released to the modified list before the process gets swapped out for
being idle.  This means that when it becomes active again, it will be
swapped in (one I/O), but it will have to fault-in all of the pages
individually that were released from the working set prior to swapping
out - and these will most likely incur hardfaults, so one I/O for each
page.

In one instance, I had a cluster of 3 Alphaservers with over 6000
logged-on users.  There was not enough memory for all of them, and the
app required something like 30 or so MB for each user.  The CPUs had a
very high load in non-user mode.  The more they tried to manage their
memory resources with PFRATL, the worse things got.  And the active
users were significantly slowed

Using the swapper instead of the pager gave them back significant
amounts of memory to allocate to the working sets of the active users.
And the non User-mode CPU utilization went way down.