On Tue, May 26, 2020 at 7:58 AM Luis Chamberlain <mcgrof@xxxxxxxxxx> wrote: > > This makes use of the new taint_firmware_crashed() to help > annotate when firmware for device drivers crash. When firmware > crashes devices can sometimes become unresponsive, and recovery > sometimes requires a driver unload / reload and in the worst cases > a reboot. Just for the record, the underlying problem you seem to be complaining about does not appear to be a firmware crash at all. It does happen to result in a firmware crash report much later on (because when the PCIe endpoint is this hosed, sooner or later the driver thinks the firmware is dead), but it's not likely the root cause. More below. > Using a taint flag allows us to annotate when this happens clearly. > > I have run into this situation with this driver with the latest > firmware as of today, May 21, 2020 using v5.6.0, leaving me at > a state at which my only option is to reboot. Driver removal and > addition does not fix the situation. This is reported on kernel.org > bugzilla korg#207851 [0]. I took a look, and replied there: https://bugzilla.kernel.org/show_bug.cgi?id=207851#c2 Per the above, it seems more likely you have a PCI or power management problem, not an ath10k or ath10k-firmware problem. > But this isn't the first firmware crash reported, > others have been filed before and none of these bugs have yet been > addressed [1] [2] [3]. Including my own I see these firmware crash > reports: Yes, firmware does crash. Sometimes repeatedly. It also happens to be closed source, so it's nearly impossible for the average Linux dev to debug. But FWIW, those 3 all appear to be recoverable -- and then they crash again a few minutes later. So just as claimed on prior iterations of this patchset, ath10k is doing fine at recovery [*] -- it's "only" the firmware that's a problem. (And, if a WiFi firmware doesn't like something in the RF environment...it's totally understandable that the crash will happen more than once. Of course that sucks, but it's not unexpected.) Crucially, rebooting won't really do anything to help these people, AIUI. Maybe what you really want is to taint the kernel every time a non-free firmware is loaded ;) I'd also note that those 3 reports are 3 years old. There have been many ath10k-firmware updates since then, so it's not necessarily fair to dig those back up. Also, bugzilla.kernel.org is totally ignored by many linux-wireless@ folks. But I digress... All in all, I have no interest in this proposal, for many of the reasons already mentioned on previous iterations. It's way too coarse and won't be useful in understanding what's going on in a system, IMO, at least for ath10k. But it's also easy enough to ignore, so if it makes somebody happy to claim a taint, then so be it. Regards, Brian [*] Although, at least one of those doesn't appear to be as "clean" of a recovery attempt as typical. Maybe there are some lurking driver bugs in there too. > * korg#207851 [0] > * korg#197013 [1] > * korg#201237 [2] > * korg#195987 [3] > > [0] https://bugzilla.kernel.org/show_bug.cgi?id=207851 > [1] https://bugzilla.kernel.org/show_bug.cgi?id=197013 > [2] https://bugzilla.kernel.org/show_bug.cgi?id=201237 > [3] https://bugzilla.kernel.org/show_bug.cgi?id=195987