[Info-vax] New filesystem mentioned

Wed May 15 12:50:22 EDT 2019

On 2019-05-15 12:31:28 +0000, Bob Gezelter said:

>> That's the point of an ACP.  (And the problem with an ACP.)  All I/O 
>> for all users goes through a single process.  There is no need for the 
>> DLM.

If by "single process" you meant single path, that's correct.  ACPs on 
OpenVMS aren't usually in the primary path for an I/O, as passing data 
out of kernel and back in is a performance problem.

Designs such as this are also used with clustering.  For a clustering 
solution such as Xsan or various similar apps, one host coordinates the 
activities, and that host is effectively the DLM.  In Apple 
terminology, this host is the metadata controller (MDC).  Xsan uses a 
primary-secondary design, with any secondary MDCs present mirroring the 
primary. This isn't far off of how the OpenVMS cluster connection 
manager operates.

>> The ACP is responsible for its caching.  The ACP is a weird mixture of 
>> process context and device driver context.  It has access to the 
>> various synchronization mechanisms available to both processes and 
>> drivers.  It can use the DLM, but it does not have to.

Correct.  And it's definitely a weird mix.  The only doc is from 
existing examples, and from Jamie Hanrahan's Advanced VMS Device Driver 
Techniques book.  I used the former approach to learn and write the 
first ACP.  Do *not* use NETACP as your template example.  NETACP is 
weird even for an ACP.  Best to look at the magtape ACP.  As for the 
book, it's pretty good but does have a few errors.  It's also long out 
of print.  There are some example ACPs on the Freeware, for those that 
don't have access to the source listings.  (Whether VSI will be 
generating source listings, assuming sufficient rights are available to 
VSI to offer that?  But I digress.)

> With all due respect, I do not agree with this characterization of an ACP.
> 
> An ACP does not do away with the need for a DLM. An OpenVMS cluster 
> running shared volumes requires DLM functionality, as there is more 
> than one ACP per volume accessing the on-disk file structure data 
> structures. Orthogonally, the DLM is used for coordinating RMS-level 
> intra-file activity. The bright-line difference between an ACP and the 
> FILES-11 Level 2/5 XQP is that the XQP operates at inner-mode(s) within 
> the requesting process rather than a separate process as is the case 
> with the XQP.

There are lots of ways for systems and apps to cluster.  The classic 
OpenVMS design is not the only way, and the OpenVMS design does have 
some issues.  As for other approaches, Linux has integrated DLMs and 
APIs from Red Hat and from Oracle.  Apple uses MDCs.  Oracle layered 
clustering support atop ZFS.  Apps can be and commonly are coded to 
cluster and often using existing support such as Hadoop, too.  Etc.  
And a whole lot of folks can and do use a high-availability design with 
failover, whether it's a file server that's failing over, or a database.

On OpenVMS, ACPs are usually mediating the user-mode requirements of a 
local device, though nothing here precludes a metadata controller 
design and routing some of the remote storage access through a host, 
and for various reasons that I/O centralization can be simpler than 
spreading everything out including the complexity of the coordination.  
Communications among hosts would be via user-mode or more likely via 
the kernel communications driver interface known as VCI.

Long ago, OpenVMS development prototyped something that could have been 
used here, too.  That project was called QIOserver.  Getting this stuff 
right is Not Easy, which is why most folks use an existing 
implementation, be it the OpenVMS DLM, DLM-like support in another 
operating system, or an available add-on integrated with the app.

XQPs were centrally a performance optimization over ACPs, seeking to 
avoid swapping between processes—OpenVMS was classically poor at that, 
and that whole design didn't really become ~competitive until ~L4—by 
mapping the relevant code into each user process.  AFAIK, there's 
little difference around what I/O activity traversing across modes, 
though not having to copy buffers around and into and out of a separate 
process is beneficial.  XQPs are Not Fun to debug.  ACPs are definitely 
easier in that regard for the user-mode chunk of their debugging, and 
the ACP debugging I have been using for many years uses the debugger 
and a remote DECterm for that.  Kernel mode debugging is still using 
XDELTA (an old DEC enet host with that name was almost always booted 
with XDELTA loaded, too) or more recently the System Code Debugger.

> The implementation of the file system auxiliary processing as either 
> intra-driver, ACP, or XQP is transparent to the user. The actual 
> interface provided by an ACP/XQP is described in the OpenVMS IO User's 
> Manual.

In the OpenVMS design, yes, that and some undocumented shenanigans 
around mounting and dismounting—which is another area of OpenVMS that's 
a dog's breakfast, and particularly if you're not using an ACP with a 
file-oriented device, as I and others have used those ACPs—to make the 
volumes accessible.  This area is also a mess around USB removable 
device support, and we'll probably eventually see some improvements 
here.

> Could a linux-style VFS-like library be implemented on OpenVMS? Could 
> such a library framework be implemented as an XQP? Likely the answer to 
> both questions is "Yes". As usual, the devil is in the details.

If VSI were to consider adding a FUSE layer, it'd almost inherently 
involve reworking or rewriting the existing XQP to play in the new 
environment, as well as work on mount and dismount and other services.  
Obvious candidates here would include a reworking an NFS client, 
reworking and updating the ODS-3 and ODS-4 ISO-9660 support, rework or 
replace the EFI FAT support—and whether that ends up in a FUSE layer or 
akin to the XQP, and the addition of an SMB client.  NFS is one of the 
few add-on file system clients that exists for OpenVMS, too. Not that 
the IP stack should be separately packaged and separately installed, 
but it is.

> An ACP process failing is bad, but does not always lead to a kernel 
> fault. An XQP-like component, operating in kernel mode, will almost 
> always cause a crash. Which is better? Mileage varies. Depends.

OpenVMS is a monolithic kernel.  ACPs are part of the kernel.  So too 
is the XQP.  Though even an L4-like operating system design can 
certainly get tangled up and tip over.  The FUSE API approach though, 
is deliberately intended be resilient, and to allow the file system 
code tip over, and to be _much_ easier to debug, and much easier to add.

The whole of the OpenVMS I/O subsystem design is far too trusting of 
the underlying hardware, though that's a different problem.  That'll 
prolly all get ignored for a while due to the "servers are isolated in 
server rooms" view, at least until hosting becomes part of the 
discussion. Messes can arise even with private hosting, though that's 
going to be a little less common initially.  There's the obvious 
problems with untrusted file system mounts, whether local or remote.  
There's quite a bit of research happening on contending with 
firmware-level persistence and exploits too, with Amazon Nitro and 
other work underway.  For an intro to some of the issues that can arise 
here beyond intentionally-corrupted USB devices: 
https://www.youtube.com/watch?v=PEVVRkd-wPM

TL;DR: The availability of VAFS should resolve some of the major issues 
that folks with ODS-2 and ODS-5 are encountering.  There's yet more 
work awaiting beyond VAFS.  It'll be some years before VSI starts 
addressing other issues latent in the I/O subsystem. That work never 
ends.  FUSE and replacing the current ACP and XQP design and 
documenting it all is probably five or ten years out at best, and 
associated with no small investment in renovating and updating related 
parts of the kernel.

-- 
Pure Personal Opinion | HoffmanLabs LLC