[Info-vax] What choices are available to the OpenVMS Process Scheduler once all

Wed Feb 10 00:58:33 EST 2016

In article <b33ee0e2-8e76-46f3-a653-13eb6fcd4bdc at googlegroups.com>, sean at obanion.us writes:
> We are on OpenVMS 8.4, patches to Update 10, BL870 (16 cores with Hyper
> threading enabled,   32 CPUs seen, as recommended by our Cache application
> vendor), 64GB memory, Fibre Channel to P9500 storage array, Cache 2010
> database and applications.

Ah, Cache'.

Are there bottlenecks within the database itself?

How about its cache (not Cache') performance?  They used
to tell us way back when about the problem of sequential
database scans flushing the cache and destroying the
performance of other processes, in Cache' and related
environments.

Are there tools to tune the database internally?  I assume it's
still B-tree based and there are parameters you tweak to keep the
B-tree structure properly balanced, etc.  For example, you do
not want trees that are too low and wide, and you do not want
the opposite as well.  Concepts seem to be similar to the
concepts associated with tuning RMS indexed files.

Are your database files fragmented?  I believe with Cache'
they can grow on the fly, and that means the VMS container
files themselves can become fragmented.

Do you have Cache' locking issues?  If multiple jobs hang
behind whoever has a key lock, that can bog a system down
in such a way that average performance is low but COM
process count still spikes from time to time, for if a
small number of jobs are hogging a resource but do
eventually give it up, a burst of activity is generated
as the waiting processes get their chances, possibly
doing a small to moderate amount of work and releasing
the lock, then continuing with other processing until
the next time they need one of the heavily used
resources.

The same applies to VMS locks, which are quite possibly
the ones Cache' is using for its own locking.

I assume, of course, that you checked for SMP performance
issues including spinlocks and MPSYNC.

As someone who has managed DSM but has only looked over
the shoulder of the ISM, later Cache', support people, I
unfortunately do not know what VMS sysgen parameters
are important for running a Cache' database.  AUTOGEN in
a dry run mode, though, would tell you what parameters
it would think need changing.

Are the batch jobs invoking code running within Cache'?
Do you accept the default batch job priority or do you
change it to meet your performance goals?  I do not
believe the VMS scheduler will help you when a low
priority job holds a resource that higher priority
jobs are waiting for, so it may not help as much as
you might think to run the resource hogs at lower
priority.

[Hmm.  I believe DSM actually increased the priority
of jobs once they held the database write lock.  Wonder
if Cache' does that?  Less important or useful with a
large number of processors/threads, of course.]

> But my question is:
>
> What choices are available to the OpenVMS Process Scheduler once all the
> CPUs are in use?  And if (I suspect) the Scheduler was struggling to
> choose what processes to execute, how would it's efforts (CPU cycles)
> be reported/measured?

I don't think of it as struggling.  Processes run, basically, until
their quantum expires or until they give up the CPU for some other
reason, and at that time the processor can resume execution of some
other process waiting for cycles.  The scheduler does have extra
constraints in a multiprocessor/multithreading system, such as
attempting to reschedule processes back onto the same processor
if possible to minimize memory cache flushing, and perhaps to deal
with NUMA locality issues (does VMS know about these?), but otherwise
it's just a matter of sharing among processes of roughly equal
priority.

A wise person once taught me to tune based on queue depths and
not on percentage busy.  You seem to have queue depths (COM
processes, for example) spiking up but still have low average
utilization, a somewhat anomalous situation.  I would tend to
believe that implies bottlenecks that VMS itself cannot tune for,
such as database updates all queuing up at a single choke point
within Cache' itself, something that its tools might be able
to tell you about.

You mentioned response times of disks.  With Performance Advisor
I found those misleading. I looked at queue length, and a queue
length greater than one would be considered to be the beginnings
of a disk bottleneck, with two or higher being something to be
concerned about depending on workload characteristics. If you
have queue length information, focus on the disk volumes with
long queues, especially those with high I/O rates.

Oh, and you mentioned virtual I/O cache.  Here, look at hits
and misses.  High miss rates on busy volumes (are statistics
available by volume?) may indicate an issue - assuming, of
course, that Cache' even makes use of the disk cache given
that it likely has a very large one of its own.

To more directly address your question, yes, there is
scheduling class support, which I know nothing about and
am not sure would apply in your environment.  Others may
be able to address those kinds of questions for you.

George

 [Our last clinical VMS environment, shut down last month,
 interestingly enough, was running Cache'. It ran on older
 Alphas and was down to sporadic use, access to older data
 pending completion of an archiving project. Longevity?
 Almost 33 years, running on various VAXes and Alphas.]