[Info-vax] VMS and the (lack of the) TRIM facility.

terry+googleblog at tmk.com terry+googleblog at tmk.com
Sat Jun 20 05:23:08 EDT 2015


On Saturday, June 20, 2015 at 4:28:08 AM UTC-4, Dirk Munk wrote:
> As far as I'm aware VMS doesn't have TRIM yet, not even the Itanium 
> version. So that would make SSD's not very suitable for VMS.

There are two possible solutions, one of which might be implementable by a skilled customer and a better one that would require engineering support from multiple groups.

The easy one is to write a utility that uses the operating system's storage allocation map (BITMAP.SYS and friends on VMS) to construct a free block list and (after appropriate sanity-checking) locks out I/O to the disk from other processes and issues TRIM commands for those blocks, then unlocks I/O. Many of the earlier SSDs for pre-TRIM systems came with a "performance restoration utility" that did exactly this. Most of the underlying OS functions you would need to write such a utility exist on VMS - the "lock out all I/O except mine" exists in ANALYZE/DISK/REPAIR, and I think the MOVEFILE primitive can be coerced into marking blocks vacant and requesting TRIM.

The better one is to have all of the necessary code between the filesystem and the actual SSD device be TRIM-aware, so that a DELETE in DCL, for example, makes it through the filesystem and down into the disk driver with a list of blocks that are safe to TRIM. The big problem here is that a number of "smart" controllers don't know what TRIM is, and when the TRIM request gets added to the work queue of the controller, the controller just ignores it (at best). This means that controller and firmware vendors need to get involved. The best hope here is for new VMS releases to use recent-vintage industry-standard controllers which already know about TRIM, instead of older or obsolete controllers with special DEC / Compaq / HP firmware, where HP would have to pay for new firmware development in order to get the TRIM support added.

> In my view the only way to use SSD's with VMS is never to allocate the 
> full size of the SSD for a VMS disk. Suppose you have a 256GB SSD, then 
> you would for instance allocate a 200GB disk on that SSD. That way there 
> is always 56GB of erased space available. As soon as the SSD needs to 
> rewrite a disk block, it can use cells from the 56GB of erased space, 
> write the new data, and erase the cells that were formerly in use. This 
> also works with emulated disks (container files) on the disk of the host 
> operating system. The disk of the host operating system should never 
> allocate the full size of the SSD, that way there is always enough 
> erased cell capacity on the SSD.

All is not lost. I have some very sophisticated PCIe x16 SSDs in my homebuilt
RAIDzilla II file servers. These look like a RAID array of generic LSI Logic SAS ports on the host side, but there's magic between the LSI chip and the flash elements. These are SLC enterprise SSDs with a lot of overprovisioning. In the background, they erase flash cells and monitor wear, so there are always plenty of free erased blocks to be allocated. Once the pointer magic is done, the drive knows the old block is not connected to the host filesystem and it gets put on the erasure to-do list. This obviously works only for read/modify/write or write-in-place operations, since a file deletion on a non-TRIM-aware system will never mark the file content blocks at all (just marking the directory entry and bitmap accordingly). But with enough pre-TRIMmed blocks
in place, requests to write these unallocated but unerased blocks can be handled by pre-erased blocks due to the aforementioned pointer magic, and then the physical block that the write was intended for gets put on the "to be TRIMmed" queue.

When I got my first one in for testing (a 320GB model with 512GB of actual flash), I wrote a diagnostic to beat up on it which generated random writes and deletes at as fast a rate as the card could handle. I was getting about 1.2GByte/sec of I/O (remember, this is an old card) which hadn't slowed down after a few days of writing (writing the whole capacity of the card over and over again). And this is without TRIM. Dumber cards often hit a "write wall" when treated this way, where write performance falls off a cliff after the drive has to scramble to erase blocks in order to service write requests in the device's queue.



More information about the Info-vax mailing list