[Info-vax] NetBackup Performance Woes
Geek Nerdly
tommynoble at gmail.com
Wed May 20 14:13:05 EDT 2015
OpenVMS 8.4
2-node cluster, RX2800 i2, 4 CPUs each, 32gb RAM each.
T4 data is collected 24/7 from both nodes of the cluster.
NetBackup release 7.5
Storage is EMC, probably a bit over-specified to our needs (for normal processing), and managed by someone else. The Integrity cluster has its own pool of dedicated physical storage. Qlogic HBAs. The LUNs are set with their preferred paths and the load is distributed over 4 or 6 paths. Unfortunately OpenVMS does not automatically load balance this, but I believe I/O at that level is not where the problem is.
I believe this has been happening for months, but user activity periods have changed recently, so that users are now reporting it.
We matched the NetBackup documentation to set process quotas on the account and are using a separate account for the NetBackup service so that we could adjust settings on either account without messing with other.
The network service for NetBackup agent has a limit of 10 connections; when it's running I typically see anywhere from 2 to 8 network processes running under that service. All of those processes show up super-heavy on Direct and Buffered I/O. It is the only thing really doing anything on that node at that time of the day.
Aside from the I/O slam that goes on, what happens when NetBackup runs is at least 2 (of 4) CPUs on node B (where NetBackup agent is) are heavy in Interrupt Mode the entire time (in the example I'm looking at, cpu 0 ~75%, cpu 3 ~100%) . This almost immediately (and for the entire time) affects user activity/response times on node A, but I don't see a related anomaly in T4 data for node A. The effects are much more severe on node B.
The LIMIT_BANDWIDTH setting is only barely documented and NetBackup Support has not (yet) answered my question about whether the setting applies to each process or if it throttles the collective lot of them. I suspect that I would have to know what throughput I see, then divide that by the number of connections the service allows to have an effect. They say you can set LIMIT_BANDWIDTH, but nothing of how to look for an optimal value to use.
> Is it possible that the disks are severely fragmented?
Disk File Optimizer V3.1 runs regularly, frag indexes are all typically in the excellent to good range.
> I'd suspect the backup will consume all possible resources until and
> unless throttled due to skewed quota settings or I/O limits.
That is what seems to be happening. I don't know what quotas I should consider adjusting.
NetBackup support has admitted (in writing) that they have no expertise or advice to offer to tune this (their own product) on this platform. They have pretty much said we'll have to do this by trial and error.
Their other suggestion was to back up less data.
More information about the Info-vax
mailing list