[Info-vax] OpenVMS async I/O, fast vs. slow
Jake Hamby (Solid State Jake)
jake.hamby at gmail.com
Fri Nov 3 20:06:53 EDT 2023
On Friday, November 3, 2023 at 3:17:06 AM UTC-7, Ian Miller wrote:
> Have you looked at the Fast I/O routines? https://docs.vmssoftware.com/vsi-openvms-io-user-s-reference-manual/#ch10.html
No, I hadn't seen the Fast I/O routines and they look very relevant, thanks!
I'm going to have to take a little time to read everyone's replies, which I greatly appreciate. I've been hacking away on libuv to build it with Clang as C++ (with exceptions and RTTI disabled), in order to be able to use C++11's <atomic> header file, which is present, in place of <stdatomic.h> and a C11 compiler, which isn't. Then I had to change "_Atomic int" to "atomic_int" everywhere. It's all fairly ugly, but I'll complain about specifics another time.
The good news is that I do now have a naive port of libuv that should be able to run with a little more polishing. The last hurdle to get it running was remembering to add "/Threads" to the link line since it apparently didn't notice it was compiling a multithreaded program. The previous hurdle was changing a select() to a poll() call in the benchmark runner, because I was getting an error about select() being called on a non-socket. libuv itself was using poll(), so that one confused me for a while. SEARCH is my friend (just remember to reverse the argument order from "grep").
The hilariously bad news is that the UNIX way of doing things, in the case of one thread waking up another, is to send a single "\0" byte down a pipe to wake up the other thread. The first benchmark I started looking at, alphabetically, and also in terms of involving the least effort, is "async1" to "async8" (1, 2, 4, and 8 worker threads, respectively). Once I got "async1" to run by fixing the link line in my descrip.mms, it gave me just barely over 50 round-trip pings/sec. Not 50K. 50.
I think some of you may have guessed the same thing as I have about what's likely going on. The user-mode threads scheduler isn't thinking to look to wake up another thread in the same process to get a message from it. The main and worker thread(s) are both taking turns waiting on poll() and reading one byte from each other via a pipe. The pipe isn't the issue. The CPU is practically idle. Both threads are giving up the process quantum and sleeping until the process comes back around and notices one of the threads can wake up. At least I suspect it's something like that.
I'm glad I started out thinking about using local event flags for the same purpose, because it looks like that's actually the only way to have fast wakeup of any particular thread within a process. I'll take 10000 pings/s over 50 pings/s any day. The benchmark isn't representative of real work anyway.
It'd definitely be doable to implement a proper VMS port of libuv, and thereby enable a reasonable-speed Node.js plus any other programs using that library as an abstraction layer. The trick would be to leverage the VMS APIs, use channel numbers instead of fds, use async I/O with either a LEF or an AST callback (but not both, presumably), and use VMS APIs for file I/O, mailboxes/pipes, terminals, and sockets (including for DNS resolution).
Regards,
Jake Hamby
More information about the Info-vax
mailing list