[Info-vax] Anyone interested in another public access system
glen herrmannsfeldt
gah at ugcs.caltech.edu
Tue Apr 14 14:14:25 EDT 2009
Bill Gunshannon <billg999 at cs.uofs.edu> wrote:
> With a good best fit algorithm it is much less important.
> I would guess most fragmentation occurs when extending a file.
> I expect that fragmentation would go up when a BSD disk is
> getting close to full and there are less empty frags of a
> large size to work with.
I believe it is more due to small files than the last blocks of
large ones. Many parts of unix use many small files, where other
systems might use one large one. /usr/include for example.
> But then, when you start to
> approach the max capacity of a disk you usually have other things on your
> mind. :-) Also, the "overflow" that is common on BSD type filesystems
> may play a part in keeping fragmentation down, as well. I suspect that
> the reason for the low rate of fragmentation is due in part to de-fragging
> on the fly as files are modified. As was pointed out, in many cases an
> entire text file will be loaded into memory for editing. That would seem
> to make it easy to avoid the fragmentation problem when writting it to
> disk as a whole file as opposed to an extended file. I guess the best
> part of this is that it is done with so little overhead on the overall
> performance of the filesystem.
Also, the disk cache which keeps data in memory longer, allows
combining small changes into large ones, such that the size is
known before writing to the disk.
In the days of small memory, caching files in memory was
not possible. The RT-11 file system allocates contiguously.
As I remember, the first file open is at the beginning of the largest
free area. When another file is open, it is either at the
beginning of another large free area, or at the middle of the
free area being used by the previously opened file.
For OS/360, there is the SPACE parameter of the DD statement
indicating the expected size for the data set. The initial
allocation is done in up to four extents (contiguous groups
of tracks). If the file keeps growing, secondary allocation
is done up to a total of 16 extents.
The OS/360 solution for the many small files problem,
(such as macro libraries) was the PDS, Partitioned Data Set,
often called a library. A PDS is allocated in disk tracks
(as are other data sets) and then members written into the PDS,
similar to the library files of, I believe, VMS. Members
are stored contiguously at the end requiring COMPRESS to
recover the space used by deleted members.
I would expect all systems from the time of small memory
to have some way to efficiently store small files and
access them.
-- glen
More information about the Info-vax
mailing list