[Info-vax] OpenVMS async I/O, fast vs. slow
Jake Hamby (Solid State Jake)
jake.hamby at gmail.com
Fri Nov 10 22:46:00 EST 2023
On Thursday, November 9, 2023 at 4:10:13 PM UTC-8, Jake Hamby (Solid State Jake) wrote:
> [snip]
I figured out a perfect resource for OpenVMS I/O performance: the WASD source code. That's 25 years of optimization for VMS, right? :)
As I mentioned in another thread, a professor and a student at my university (Cal Poly, Pomona, 1990s) were really interested in DCE (or at least in DFS, the distributed filesystem built on DCE RPC), and they were philosophically opposed to the campus IT department, who were firmly in the DEC/VMS camp, despite how much money they were sending DEC in licensing and support fees. It "just worked", as far as they were concerned. The Internet was the future and the Internet ran on UNIX, not VMS.
I distinctly remember one of them scoffing one day about how the VMS admins were bragging about the OSU web server they'd set up for students and faculty to put up home pages, and how great it was, and how there was even a second Web server for VMS called WASD.
"OSU? WASD? Who ever heard of those Web servers? They can't be any good. What a terrible OS that can't just run Apache."
That was the gist of the complaint. So I always get a chuckle whenever I'm reminded that WASD is still around and still being developed, even with relatively few users. Sadly, I got an internal compiler error trying to build the latest WASD 12.10 code with the native compilers:
CC/OPTIMIZE=(LEVEL=4,TUNE=HOST)/ARCH=HOST/NAMES=(AS_IS,SHORT)/FLOAT=IEEE/IEEE=DENORM/REPOSITORY=[.X86_64.CXX_REPOSITORY]/L_DOUBLE=64/FIRST=[.X86_64.OBJ]FIRST.H/OBJECT=[.UTILS.OBJ_X86_64]httpdmon_geolocate.obj/DEFINE=(HTTPDMON_GEOLOCATE=1,GEOLOCATE_OBJECT=1) [.utils]httpdmon.c
assert error: expression = isa<X>(Val) && "cast<Ty>() argument of incompatible type!", in file /llvm$root/include/llvm-project-10/llvm/include/llvm/Support/Casting.h at line 264
%SYSTEM-F-OPCCUS, opcode reserved to customer fault at PC=FFFF8300097367DF, PS=0000001B
%TRACE-F-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
DECC$SHR SIGNAL.C;1 #20739 00000000801C97DF FFFF8300097367DF
DECC$SHR ABORT.C;1 #2967 000000008009643B FFFF83000960343B
DECC$SHR 0 000000008039B779 FFFF830009908779
DECC$COMPILER [.src]g2l_entrysymbol.cxx cast<llvm::Function, llvm::Value>
#175 0000000001A1E2C1 0000000001A1E2C1
DECC$COMPILER [.src]g2l_entrysymbol.cxx dwarf
#394 0000000000AD057B 0000000000AD057B
DECC$COMPILER [.src]g2l_symbol.cxx convertSymbol
#828 0000000000A4993A 0000000000A4993A
DECC$COMPILER [.src]g2l_module.cxx convertDeclarations
#1554 0000000000A473B7 0000000000A473B7
DECC$COMPILER [.src]g2l_module.cxx convertModule
#1165 0000000000A44603 0000000000A44603
DECC$COMPILER [.src]g2l_module.cxx G2L_COMPILE_MODULE
#619 0000000000A43181 0000000000A43181
DECC$COMPILER GEM_CO.BLI;1 GEM_CO_COMPILE_MODULE
#3223 0000000000000A54 00000000006D8844
DECC$COMPILER COMPILE.C;1 gemc_be_master
#103704 00000000004238BE 00000000004238BE
DECC$COMPILER COMPILE.C;1 gem_xx_compile
#102915 0000000000422597 0000000000422597
DECC$COMPILER GEM_CP_VMS.BLI;1 GEM_CP_MAIN
#2447 000000000000384E 00000000006CC23E
DECC$COMPILER 0 0000000000AD36A4 0000000000AD36A4
DECC$COMPILER 0 00000000021887AD 00000000021887AD
PTHREAD$RTL 0 000000008004122C FFFF83000950922C
PTHREAD$RTL 0 0000000080002316 FFFF8300094CA316
0 FFFF8300081FC0A6 FFFF8300081FC0A6
DCL 0 000000008006778B 000000007ADEB78B
%TRACE-I-LINENUMBER, Leading '#' specifies a source file record number.
%TRACE-I-END, end of TRACE stack dump
%MMS-F-ABORT, For target [.UTILS.OBJ_X86_64]httpdmon_geolocate.obj, CLI returned abort status: %X10000434.
%MMS-F-ABORT, For target BUILD, CLI returned abort status: %X10EE8034.
It's too bad because I was hoping to see how it compared to Apache on VMS, using the Siege benchmark. I copied the test params from Phoronix Test Suite, which repeatedly gets a simple .html file that loads a .png file, both about 4K in size. Apache on VMS isn't terrible. I got 1174 trans/sec with 10 clients (3.43 MB/sec thoroughput) which seems decent enough for such small files. I'm using Linux on the host VM as the client and it actually runs out of sockets if you don't enable keepalive and you do more than 64000 requests. Closed TCP sockets spend a little time in TIME_WAIT so you can run through them quickly.
Despite not being able to run an optimized WASD, looking at the source code confirmed some details. QIO is slightly faster than RMS, once you've found and opened the file. I'll likely reference the file I/O code for default buffer and copy sizes.
The TCP/IP code had some surprises. TCP/IP send/receive is limited to 65535 bytes by the $QIO interface (for disk I/O, the IOSB has space for a return value of up to 2^32-1 bytes, but for mailboxes and sockets, it only has 16 bits to return a data length), so it turns out that you get optimum results sending/receiving in chunks of the highest multiple of the TCP max segment size that fits in 65535 bytes.
In addition, WASD testing revealed that the VSI TCP/IP stack sets its own TCP send buffer size to that value (highest multiple of the TCP MSS below 65536), but testing revealed the best server performance came from changing the TCP send buffer size to twice that value.
One annoyance for me for porting libuv is that the TCP/IP stack supports QIO send/receive with scatter/gather buffers (readv()/writev()), which you see often in socket code for putting headers and payloads in different places. But there's no equivalent for disk I/O, and also you have to read/write on 512-byte block boundaries. So for reads/writes that start from an unaligned offset in the file, I'll have to use a small buffer to read the block and then memcpy() the portion that the user asked for. For writes, I'll have to read that block, then overlay it with the portion they want to overwrite, then write the new block and the rest of the write. In the worst case, someone's writing multiple small chunks of data, misaligned, to a file using an iovec list, and I'll have to coalesce them into 512-byte or larger chunks. I have to keep reminding myself that at least it isn't scatter/gather on the disk side, but only the in-memory copying that could be very fragmented, at least from the perspective of handling an individual async file I/O request.
More information about the Info-vax
mailing list