[Info-vax] OpenVMS I64 V8.1 "Evaluation Release"?

Fri Mar 23 10:43:27 EDT 2012

Hey! Wonderful. Someone else *do* understand. :-)

	Johnny

On 2012-03-22 23.02, John Wallace wrote:
> On Mar 22, 8:51 pm, glen herrmannsfeldt<g... at ugcs.caltech.edu>  wrote:
>> Johnny Billquist<b... at softjar.se>  wrote:
>>
>> (snip)
>>
>>>> Well, if you put it that way, IA32 has a 45 bit virtual address
>>>> space, which should have been plenty big enough. That is, 16 bit
>>>> segment selectors minus the local/global bit and ring bits,
>>>> and 32 bit offsets.
>>> I don't know exactly how the virtual addresses look on the IA32 so I
>>> can't make more explicit comments. But if it actually forms a 45-bit
>>> virtual address space, then sure. But it depends on how the virtual
>>> address is calculated. Maybe someone can make a more accurate comment,
>>> if we want to pursue that.
>>
>> IA32 still has the segment selector system that was added with
>> the 80286. While there were many complaints about the size of 64K
>> segments, that isn't so much of a problem at 4GB. A task can
>> have up to 8192 segments, each up to 4GB.
>>
>>>> Also, many IA32 processors have a 36 bit physical address, again
>>>> plenty big enough for most people even now.
>>> Right. But the physical address space becomes a question for the OS
>>> allocation and resource utilization. Good in its own way, but it
>>> won't allow your program to use more memory space than what you
>>> can address in your virtual address space.
>>
>> The 32 bit MMU that came after the above mentioned system and
>> before the 36 bit address bus did complicate things, but with the
>> appropriate OS support, it could have been done.
>>
>> (snip, I wrote)
>>
>>>> Having a large virtual address space is nice, but you can't
>>>> practically run programs using (not just allocating, but actually
>>>> referencing) 8, 16, or 32 times the physical address space.
>>> You perhaps can't use all of it at the same time, for various reasons.
>>> But you might definitely want to spread your usage out over a larger
>>> address space than 32 bits allows.
>>
>> Maybe 2 or 3 times, but not 16 or 32. Note that disks haven't
>> gotten faster nearly as fast as processors, especially latency.
>>
>> If you do something, such as matrix inversion, that makes many
>> passes through a large array you quickly find that you can't
>> do it if it is bigger than physical memory.
>
> It's somewhat strange to say you *can't* do it. You *can* do it,
> people have been doing it for years and sometimes still do it. In a
> demand paged environment the application will run to completion
> [assuming sufficient pagefile etc], and it will get the same answers
> as it would with lots more real RAM. The application may take a while
> longer to run when the available physical memory is significantly less
> than the needed physical memory.
>
> Please note the words "available" and "needed".
>
> This is just the way virtual memory systems work (when done right).
>
> Almost no real application *needs* all its virtual address space to be
> in physical memory in quick succession. Many will work fine with a
> small portion in physical memory at any given time. That's why demand
> paging is useful (and why swapping a whole process is less useful).
>
> In real world systems the physical memory is shared between the
> application of interest, what the OS needs, and the memory needed by
> any other applications on the system at the time. There aren't many
> virtual memory applications that will fail because they don't have
> enough physical memory available, which is fortunate really.
>
>
>>
>>>> The rule for many years, and maybe still not so far off, is that
>>>> the swap space should be twice the physical memory size. (Also,
>>>> that was when memory was allocated out of backing store. Most now
>>>> don't require that.)
>>> That has not been true for over 10 years on any system. It's actually a
>>> remnant from when memory was managed in a different way in Unix, and
>>> originally the rule was that you needed 3 times physical memory in swap.
>>
>> Ones I worked with, it was usually 2, but 3 probably also would have
>> been fine.
>>
>>> The reason for the rule, if you want to know, was that way back,
>>> physical memory was handled somewhat similar to cache, and swap was
>>> regarded as "memory". So, when a program started, it was allocated room
>>> in swap. If swap was full, the program could not run. And when running,
>>> pages from swap was read into physical memory as needed. (And paged out
>>> again if needed.)
>>
>> The first system that I remember this on was OS/2, I believe 1.2
>> but maybe not until 2.0. If you ran a program from floppy, it required
>> that the swap space exist, as you might take the floppy out.
>>
>> Well, using the executable file as backing store for itself is a
>> slightly different question, but for many years they didn't even
>> do that. Allocating in swap avoids the potential deadlock when the
>> system finds no available page frames on the swap device, and needs
>> to page something out. It made the OS simpler, at a cost in swap space.
>> (And the ability to sell more swap storage.)
>>
>>> This should make it pretty obvious that you needed more swap than
>>> physical memory, by some margin, or you could start observing effects
>>> like a program not being able to run because there was no memory, but
>>> you could at the same time see that there was plenty of free physical
>>> memory. A very silly situation.
>>
>> When main memory was much smaller, that was much less likely to
>> be true, but yes.
>>
>>> No system today works that way. You allocate memory, and it can be in
>>> either swap, or physical memory. You do not *have* to have space
>>> allocated in swap to be able to run. You don't even need to have any
>>> swap at all today.
>>
>> Reminds me of wondering if any processors could run entirely off
>> built-in cache, with no external memory at all.
>
> If I remember rightly, some (all?) Alpha processors start up by
> reading code from a serial ROM into the on-chip cache, and the system
> setup code continues from there.
>
> Whether you could do anything *useful* just from on-chip cache is a
> different question. The PDP11 era of being able to do useful things in
> 32KW seems to have got lost somewhere.
>
>>
>>>> If you consider that there are other things (like the OS, other
>>>> programs and disk buffers) using physical memory, you really
>>>> won't want a single program to use more than 4GB virtual on
>>>> a machine with 4GB real memory. Without virtual memory, you
>>>> would probably be limited to about 3GB on a 4GB machine.
>>> It's called paging, and every "modern" OS does it, all the time,
>>> for all programs. Not a single program you are running today are
>>> all in memory at the same time. Only parts of it is.
>>
>> As I noted above, it isn't hard to write a program, working with
>> a large matrix, which does pretty much require it all in memory.
>>
>> With matrix operations, they tend to run sequentially through
>> blocks of memory. I once wrote a very large finite-state
>> automaton that pretty much went randomly through about 1GB of
>> memory. No hope at all to swap.
>
> But systems that swap rather than page also mostly got lost in the
> PDP11 era. OK there was a brief period in the life of some UNIXes when
> they'd swap and not page, but that wasn't a long term success - as you
> noted earlier, in the wrong circumstances an application might well
> fail to run even if the system appeared to have enough free memory for
> useful work to be done. For this and other reasons, demand paging is
> in general more useful than swapping, if you have the choice
> [exceptions may well have applied in the past, but it's hard to see
> where they'd apply nowadays].
>
>>
>>> So, even if you are running a program that is 4 GB in size,
>>> it will not be using 4 GB of physical memory at any time.
>>
>> Maybe for the programs you write...
>
> No program, no processor, ever uses a whole 4GB of physical memory at
> one time. Not one. How could it? The data bus isn't wide enough for a
> start :)
>
> So, it becomes a question of how much of that 4GB (or whatever) does
> it need, at what access time. The access time options typically are a
> compromise between cost and size (faster is more expensive) and
> include on chip cache time (fastest, but most expensive per GB), local
> main memory time, remote-NUMA memory time, soft page fault time (from
> in-memory cache), hard page fault time (from disk or network -
> biggest, slowest, cheapest per GB). They will all work (in the right
> circumstances) and the application will get the same results, only the
> timing will be different.
>
> In any real system with 4GB of physical memory, no real application
> ever gets to use the whole 4GB at once. The OS wants some, for non-
> pageable OS code and data, for a start.
>
> Some applications may well have poor "locality of access". But once
> again that affects performance, not whether it will run or not.
> "Locality of access" effects apply at various levels - a program needs
> good locality of access on a fine scale to make best use of on-chip
> caches, and on a coarse scale it needs good locality of access to
> avoid excessive unnecessary paging.
>
> Application code such as matrix arithmetic can be written either from
> first principles (which may lead to poor locality of access) or with a
> bit more care to use techniques such as "tiling" to get better
> locality of access (which may affect on-chip cache behaviour, or
> paging behaviour, or both).
>
> Much of this is basic computer and OS and application design stuff,
> whatever chip is involved, whatever OS is involved.
>
>>
>>> And if your program is only using 100 KB, the odds are that
>>> not all 100 KB will be in physical memory either.
>>
>> -- glen
>