this post was submitted on 20 Jul 2024

114 points (100.0% liked)

Technology

72867 readers

2794 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

114

To what extent, if at all, would have CrowdStrike's faulty update have been easier to deal with with an immutable distro? (lemmy.ml)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]

42 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 50 points 1 year ago (5 children)

Turn off computer boot from previous day's image, wipe current day's image, continue using computer.

[–] [email protected] 18 points 1 year ago (2 children)

That's all well and good, but many of these Windows machines were headless or used by extremely non-technical people - think tills at your supermarket or airport check-in desks. Worse, some of these installations were running in the cloud, so console access would have been tricky.

[–] [email protected] 11 points 1 year ago (2 children)

The cloud systems would have been a problem. Any local systems, a non-technical user, could have easily done because their IT department could simply tell them, turn on your computer, and when it gets to this screen with these words, press the down arrow key one time and press enter, and your computer will boot normally.

[–] [email protected] 22 points 1 year ago* (last edited 1 year ago) (1 children)

You wildly overestimate the average person's willingness to do that.

[–] [email protected] 11 points 1 year ago (1 children)

Their willingness to do it would primarily come from the fact that they have a job to do, and if their co-workers are doing their jobs because they followed the instruction and they are not, then the boss is going to have a nice look at them.

[–] [email protected] 7 points 1 year ago (2 children)

This relies on the assumption that everyone else, or at least a significant portion, in the office managed to do it.

I'm not talking about whether or not they're actually physically capable of it, of course they are. Im talking about how people immediately shut down and pretend they can't follow simple directions the second something relates to a compute.

[–] [email protected] 3 points 1 year ago

Mmmm. Fair point

[–] [email protected] 2 points 1 year ago

Yeah but there’s also always one guy in the group (me) who knows what they’re doing and could just spend an hour doing it for everyone else.

[–] [email protected] 9 points 1 year ago* (last edited 1 year ago) (1 children)

You clearly haven't worked a help desk if you think even those simple instructions are something every end user is capable of or willing to do without issue.

[–] [email protected] 2 points 1 year ago

I guess I had really good colleagues. I was the network administrator for a small not-for-profit organization and the only time people came to me with computer problems was when they had tried the things that they knew worked first. If the obvious answers did not fix the problem, then they would bring it to my attention.

[–] [email protected] 11 points 1 year ago (1 children)

…until the CrowdStrike agent updated, and you wind up dead in the water again.

The whole point of CrowdStrike is to be able to detect and prevent security vulnerabilities, including zero-days. As such, they can release updates multiple times per day. Rebooting in a known-safe state is great, but unless you follow that up with disabling the agent from redownloading the sensor configuration update again, you’re just going to wing up in a BSOD loop.

A better architectural solution like would have been to have Windows drivers run in Ring 1, giving the kernel the ability to isolate those that are misbehaving. But that risks a small decrease in performance, and Microsoft didn’t want that, so we’re stuck with a Ring 0/Ring 3 only architecture in Windows that can cause issues like this.

[–] [email protected] 3 points 1 year ago

That assums the file is not stored on a writable section of the filesystem and treated as application data and thus wouldn't survive a rollback. Which it likey would.

[–] [email protected] 3 points 1 year ago (1 children)

Wouldn't help (on its own), you'd still get auto-updated to the broken version.

[–] [email protected] 4 points 1 year ago

If I'm correct wasn't a fix found and deployed within several hours, so the next auto update would not have likely had the same issue.

[–] [email protected] 2 points 1 year ago (1 children)

I’m familiar enough with Linux but never used an immutable distro. I recognize the technical difference between what you describe and “go delete a specific file in safe mode”. But how about the more generic statement? Is this much different from “boot in a special way and go fix the problem”? Is any easier or more difficult than what people had to do on windows?

[–] [email protected] 4 points 1 year ago (1 children)

Primarily it's different because you would not have had to boot into any safe mode. You would have just booted from the last good image from like a day ago and deleted the current image and kept using the computer.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (2 children)

What’s the user experience like there? Are you prompted to do it if the system fails to boot “happily”?

[–] [email protected] 2 points 1 year ago (1 children)

Honestly, I'm actually not sure as I never had the system break that badly while I was using it.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

lol thanks for the answer. This is the really relevant bit isn’t it? My Linux machines have also never died this badly before. But I’ve seen windows do it a number of times before this whole fiasco.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

I don't think any of the major distros do it currently (some are working twards it tho), but there are ways (primarily/only one I know is with systemd-boot). It invokes one of the boot binaries (usually "Unified Kernel Images") that are marked as "good" or one that still has "tries left" (whichever is newer). A binary that has "tries left" gets that count decremented when the boot is unsuccessful and when it reaches 0 it is marked as "bad" and if it boot successfully it gets marked as "good".

So this system is basically just requires restarting the system on an unsuccessful boot if it isn't done already automatically.

[–] [email protected] 1 points 1 year ago (1 children)

Would still need to be on site.

[–] [email protected] 1 points 1 year ago

True