On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote: > On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote: > > On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote: > > > That would be great, but I don't see that here, do you? All I see is > > > the shutdown sequence changing because someone wants it to go "faster" > > > with the threat of hardware breaking if we don't meet that "faster" > > > number, yet no knowledge or guarantee that this number can ever be known > > > or happen. > > The idea was to have somewhere to send notifications when the hardware > > starts reporting things like power supplies starting to fail. We do > > have those from hardware, we just don't do anything terribly useful > > with them yet. > Ok, but that's not what I recall this patchset doing, or did I missing > something? All I saw was a "reorder the shutdown sequence" set of > changes. Or at least that's all I remember at this point in time, > sorry, it's been a few days, but at least that lines up with what the > Subject line says above :) That's not in the series, a bunch of it is merged in some form (eg, see hw_protection_shutdown()) and more of it would need to be built on top if this were merged. > > > Agreed, but I don't think this patch is going to actually work properly > > > over time as there is no time values involved :) > > This seems to be more into the area of mitigation than firm solution, I > > suspect users will be pleased if they can make a noticable dent in the > > number of failures they're seeing. > Mitigation is good, but this patch series is just a hack by doing "throw > this device type at the front of the shutdown list because we have > hardware that crashes a lot" :) Sounds like a mitigation to me. > > It feels like if we're concerned about mitigating physical damage during > > the process of power failure that's a very limited set of devices - the > > storage case where we're in the middle of writing to flash or whatever > > is the most obvious case. > Then why isn't userspace handling this? This is a policy decision that > it needs to take to properly know what hardware needs to be shut down, > and what needs to happen in order to do that (i.e. flush, unmount, > etc.?) And userspace today should be able to say, "power down this > device now!" for any device in the system based on the sysfs device > tree, or at the very least, force it to a specific power state. So why > not handle this policy there? Given the tight timelines it does seem reasonable to have some of this in the kernel - the specific decisions about how to handle these events can always be controlled from userspace (eg, with a sysfs file like we do for autosuspend delay times which seem to be in a similar ballpark).
Attachment:
signature.asc
Description: PGP signature