[Info-vax] openvms and xterm

Scott Dorsey kludge at panix.com
Wed Apr 24 14:10:30 EDT 2024


Dan Cross <cross at spitfire.i.gajendra.net> wrote:
>The thing is, when you're working at scale, managing services
>across tens of thousands of machines, you quickly discover that
>shit happens.  Things sometimes crash randomly; often this is
>due to a bug, but sometimes it's just because the OOM killer got
>greedy due to the delayed effects of a poor scheduling decision,
>or there was a dip on one of the voltage rails and a DIMM lost a
>bit, or a job landed on a machine that's got some latent
>hardware fault and it just happened to wiggle things in just the
>right way so that a 1 turned into a 0 (or vice versa), or any
>number of other things that may or may not have anything to do
>with the service itself.

Oh, I understand this completely.  I have stood in the middle of a large
colocation facility and listened to Windows reboot sounds every second or
two coming from different places in the room each time.

What I don't necessarily understand is why people consider this acceptable.
People today just seem to think this is the normal way of doing business.
Surely we can do better.
--scott
-- 
"C'est un Nagra. C'est suisse, et tres, tres precis."



More information about the Info-vax mailing list