Re: [PATCH 1/2] media: si2168: request caching of firmware to make it available on resume

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Fri, 9 Apr 2021 16:58:02 +0000

On Fri, Apr 09, 2021 at 01:29:57PM +0200, Mauro Carvalho Chehab wrote:
> Em Thu, 1 Apr 2021 16:42:26 +0200
> Lukas Middendorf <kernel@xxxxxxxxxxx> escreveu:
> 
> > Hi,
> > 
> > I see this (or a similar fix) has not yet been included in 5.12-rc5.
> > Any further problems or comments regarding this patch? It still applies 
> > cleanly to current git master and the problem is still relevant.
> 
> Well, I fail to see why si2168 is so special that it would require it...
> 
> on a quick check, it sounds that there's just a single driver using this
> kAPI:
> 
> 	drivers/net/wireless/mediatek/mt7601u/mcu.c:            return firmware_request_cache(dev->dev, MT7601U_FIRMWARE);
> 
> while there are several drivers on media that require firmware.
> 
> Btw, IMHO, the better would be to reload the firmware at resume
> time, instead of caching it, just like other media drivers.

Mauro,

Here is the thing. If we have a race to a filesystem (it calls
submit_bio()) after resume but before thaw you can end up in
a situation where async read waits forever as the read never
hit hardware.

Fixing this is part of the work I had tried long ago by removing
the kthread freezer from filesystems [0] which allow proper
filesystem freeze/thaw during suspend / resume. I am picking
this work up in the meantime.

The firmware cache resolves these races by caching firmware
in case its needed on resume. However, if a driver never
actually had called request_firmware() upon bootup, then
the firmware was never cached and the call to request_firmware()
on resume will actually trigger a submit_bio().

In my tests the race does trigger a forever wait on XFS and btrfs, but
not on ext4. But in any case, I can put a stop gap to these issues
by issuing a try lock on the usermode helper lock prior to a direct
fs read, however that's just a hack, and preference is to just resolve
this by getting drivers to properly call request_firmware() before
thaw. The commit log for the one user you mentioned explains well why
that driver needed it, commit d723522b0be4 ("mt7601u: use
firmware_request_cache() to address cache on reboot") was added
since the device may sometimes retain the firmware on the hardware
device upon reboot, and in such case not trigger a request_firmware()
call on reboot on the driver side.

If such cases happen on other drivers, they can use that.

Its not clear to me from looking at the media APIs whether or not
all drivers are always properly calling the request_firmware() API
on suspend, prior to resume. If not that needs to be fixed.

  Luis