Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Mon, 18 May 2020 16:56:55 +0000

On Sat, May 16, 2020 at 03:50:55PM +0200, Johannes Berg wrote:
> On Sat, 2020-05-16 at 15:24 +0200, Johannes Berg wrote:
> 
> > Instead of the kernel taint, IMHO you should provide an annotation in
> > sysfs (or somewhere else) for the *struct device* that had its firmware
> > crash. Or maybe, if it's too complex to walk the entire hierarchy
> > checking for that, have a uevent, or add the ability for the kernel to
> > print out elsewhere in debugfs the list of devices that crashed at some
> 
> I mean sysfs, oops.
> 
> 
> In addition, look what we have in iwl_trans_pcie_removal_wk(). If we
> detect that the device is really wedged enough that the only way we can
> still try to recover is by completely unbinding the driver from it, then
> we give userspace a uevent for that.

Nice! Indeed a uevent is in order for these sorts of things, and I'd
argue that it begs the question if we should even uevent for any taint
as well. Today these are silent. If the kernel crashes, today we only
give userspace a log.

> I don't remember exactly how and
> where that gets used (ChromeOS) though, but it'd be nice to have that
> sort of thing as part of the infrastructure, in a sort of two-level
> notification?
> 
> Level 1: firmware crashed, but we're recovering, at least mostly, and
> it's more informational
> 
> Level 2: device is wedged, going to try to recover by some more forceful
> means (perhaps some devices can be power-cycled? etc.) but (more) state
> would be lost in these cases?

I agree that *all* this would be ideal. I don't see this as mutually
exclusive with a taint on the kernel and module for the device.

> Still don't think a kernel taint is appropriate for either of these.

>From a support perspective, I do think it is vital quick information.

  Luis