[Info-vax] OpenVMS I64 V8.1 "Evaluation Release"?

Thu Mar 22 19:05:26 EDT 2012

John Wallace <johnwallace4 at yahoo.co.uk> wrote:

(snip, I wrote)
>> If you do something, such as matrix inversion, that makes many
>> passes through a large array you quickly find that you can't
>> do it if it is bigger than physical memory.

> It's somewhat strange to say you *can't* do it. You *can* do it,
> people have been doing it for years and sometimes still do it. In a
> demand paged environment the application will run to completion
> [assuming sufficient pagefile etc], and it will get the same answers
> as it would with lots more real RAM. The application may take a while
> longer to run when the available physical memory is significantly less
> than the needed physical memory.

Well, as we are talking about multi-GB addresses, and running through
them at disk speeds, it could take long enough to never finish.

Now, disk MTBF are amazingly large these days, some in the 100 year
range, and with enough UPS and redundant power supplies you might
keep a machine running for 100 years, but I won't be around then to
see the result.

If I remember, matrix inversion is O(N**3), that is, it makes about N
passes through an NxN matrix. Ignoring the numerical instabilities
that might come up, for a 2GB matrix of 8 byte doubles, N is 16384.
At 100MB/s, and assume not time lost to seeks, 327680s, or a few days.

But if you assume that there are seeks and latency, 4ms average
for a 7200RPM disk, it could be up to N**3*4ms, which, if I did
it right, is 557 years. 

> Please note the words "available" and "needed".

> This is just the way virtual memory systems work (when done right).

> Almost no real application *needs* all its virtual address space to be
> in physical memory in quick succession. Many will work fine with a
> small portion in physical memory at any given time. That's why demand
> paging is useful (and why swapping a whole process is less useful).

Sorry, all the times I meantioned "swap", I meant paging. (Except
for any references to the 80286, which did swapping.) The use of
the two words has gotten a little mixed up by now. For a long time,
the unix page files were named swap, and likely still are. (Just 
like memory dump files are named core, even though magnetic cores
are now mostly in museums.)

As I understand it, some systems will move groups of pages at
once and call it swap, to help optimize the disk access patterns.
(Avoid the seek per page that could otherwise occur.)

> In real world systems the physical memory is shared between the
> application of interest, what the OS needs, and the memory needed by
> any other applications on the system at the time. There aren't many
> virtual memory applications that will fail because they don't have
> enough physical memory available, which is fortunate really.

(snip)
>> With matrix operations, they tend to run sequentially through
>> blocks of memory. I once wrote a very large finite-state
>> automaton that pretty much went randomly through about 1GB of
>> memory. No hope at all to swap.

> But systems that swap rather than page also mostly got lost in the
> PDP11 era. OK there was a brief period in the life of some UNIXes 
> when they'd swap and not page, but that wasn't a long term success 

As noted above, the words aren't consistently used.

(snip)
> No program, no processor, ever uses a whole 4GB of physical memory at
> one time. Not one. How could it? The data bus isn't wide enough for a
> start :)

Well, all I meant was faster than demand paging could possibly
keep up with. If you are really unlucky, it is one page-in per
matrix element accessed.

(snip)
> In any real system with 4GB of physical memory, no real application
> ever gets to use the whole 4GB at once. The OS wants some, for non-
> pageable OS code and data, for a start.

> Some applications may well have poor "locality of access". But once
> again that affects performance, not whether it will run or not.
> "Locality of access" effects apply at various levels - a program needs
> good locality of access on a fine scale to make best use of on-chip
> caches, and on a coarse scale it needs good locality of access to
> avoid excessive unnecessary paging.

and it is not so hard to write programs with bad locality of access.

> Application code such as matrix arithmetic can be written either from
> first principles (which may lead to poor locality of access) or with a
> bit more care to use techniques such as "tiling" to get better
> locality of access (which may affect on-chip cache behaviour, or
> paging behaviour, or both).

Yes, the descriptions above assumed on wasn't doing that. 

> Much of this is basic computer and OS and application design stuff,
> whatever chip is involved, whatever OS is involved.

For matrix operations, one would hope so, but there are still enough
times it isn't done. For things like web browsers, I don't know
that there is anyone working on it. I would hope so, though.

-- glen