[Info-vax] A meditation on the Antithesis of the VMS Ethos

Simon Clubley clubley at remove_me.eisner.decus.org-Earth.UFP
Mon Jul 22 08:34:28 EDT 2024


On 2024-07-21, Craig A. Berry <craigberry at nospam.mac.com> wrote:
>
> On 7/21/24 4:41 AM, Subcommandante XDelta wrote:
>> The problem here is that Crowdstrike pushed out an evidently broken
>> kernel driver that locked whatever system that installed it in a
>> permanent boot loop. The system would start loading Windows, encounter
>> a fatal error, and reboot. And reboot. Again and again. It, in
>> essence, rendered those machines useless.
>
> It was not a kernel driver.  It was a bad configuration file that
> normally gets updated several times a day:
>
> https://www.crowdstrike.com/blog/falcon-update-for-windows-hosts-technical-details/
>

If it's something that can stop the system from booting, then it _should_
be treated as if it _was_ a kernel driver.

IOW, what on earth happened to the concept of a Last Known Good boot to
automatically recover from such screwups ? Windows 2000, over 2 decades
ago, had an early version of the LKG boot concept for goodness sake.

What _should_ have happened, and what should have been built into Windows
years ago as part of the standard procedures for updating system components,
is that the original version of files that were used during the last good
boot were preserved in a backup until the next successful boot.

After that, the preserved files would be overwritten with the updated
versions. OTOH, if the next boot fails, the last known good configuration
is restored and another reboot done, but exactly _once_ only. (If the LKG
boot fails, then it's probably some hardware failure or other external
factor).

> The bad file was only in the wild for about an hour and a half.  Folks
> in the US who powered off Thursday evening and didn't get up too early
> Friday would've been fine.  Of course Europe was well into their work
> day, and lot of computers stay on overnight.
>
> The boot loop may or may not be permanent -- lots of systems have
> eventually managed to get the corrected file by doing nothing other than
> repeated reboots.  No, that doesn't always work.
>
> The update was "designed to target newly observed, malicious named pipes
> being used by common C2 frameworks in cyberattacks."
>
> Most likely what makes CrowdStrike popular is that they are continuously
> updating countermeasures as threats are observed, but that flies in the
> face of normal deployment practices where you don't bet the farm on a
> single update that affects all systems all at once.  For example, in
> Microsoft Azure, you can set up redundancy for your PaaS and SaaS
> offerings so that if an update breaks all the servers in one data
> center, your services are still up and running in another.  Most
> enterprises will have similar planning for private data centers.
>
> CrowdStrike thought updating the entire world in an instant was a good
> idea. While no one wants to sit there vulnerable to a known threat for
> any length of time, I suspect that idea will get revisited. If they had
> simply staggered the update over a few hours, the catastrophe would have
> been much smaller.  Customers will likely be asking for more control
> over when they get updates, and, for example, wanting to set up
> different update channels for servers and PCs.

Or modern Windows could simply fully implement the LKG boot concept.

Simon.

-- 
Simon Clubley, clubley at remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.



More information about the Info-vax mailing list