On Fri, Apr 09, 2021 at 01:29:57PM +0200, Mauro Carvalho Chehab wrote: > Em Thu, 1 Apr 2021 16:42:26 +0200 > Lukas Middendorf <kernel@xxxxxxxxxxx> escreveu: > > > Hi, > > > > I see this (or a similar fix) has not yet been included in 5.12-rc5. > > Any further problems or comments regarding this patch? It still applies > > cleanly to current git master and the problem is still relevant. > > Well, I fail to see why si2168 is so special that it would require it... > > on a quick check, it sounds that there's just a single driver using this > kAPI: > > drivers/net/wireless/mediatek/mt7601u/mcu.c: return firmware_request_cache(dev->dev, MT7601U_FIRMWARE); > > while there are several drivers on media that require firmware. > > Btw, IMHO, the better would be to reload the firmware at resume > time, instead of caching it, just like other media drivers. Mauro, Here is the thing. If we have a race to a filesystem (it calls submit_bio()) after resume but before thaw you can end up in a situation where async read waits forever as the read never hit hardware. Fixing this is part of the work I had tried long ago by removing the kthread freezer from filesystems [0] which allow proper filesystem freeze/thaw during suspend / resume. I am picking this work up in the meantime. The firmware cache resolves these races by caching firmware in case its needed on resume. However, if a driver never actually had called request_firmware() upon bootup, then the firmware was never cached and the call to request_firmware() on resume will actually trigger a submit_bio(). In my tests the race does trigger a forever wait on XFS and btrfs, but not on ext4. But in any case, I can put a stop gap to these issues by issuing a try lock on the usermode helper lock prior to a direct fs read, however that's just a hack, and preference is to just resolve this by getting drivers to properly call request_firmware() before thaw. The commit log for the one user you mentioned explains well why that driver needed it, commit d723522b0be4 ("mt7601u: use firmware_request_cache() to address cache on reboot") was added since the device may sometimes retain the firmware on the hardware device upon reboot, and in such case not trigger a request_firmware() call on reboot on the driver side. If such cases happen on other drivers, they can use that. Its not clear to me from looking at the media APIs whether or not all drivers are always properly calling the request_firmware() API on suspend, prior to resume. If not that needs to be fixed. Luis