Re: Is request_firmware() really safe to call in resume callback when /usr/lib/firmware is on btrfs?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 13, 2020 at 11:53:36PM +0200, Lukas Middendorf wrote:
> Hey Luis,
> 
> On 13/08/2020 18:37, Luis Chamberlain wrote:
> > On Sun, Aug 09, 2020 at 08:51:35PM +0200, Lukas Middendorf wrote:
> > 
> > >     b) if the firmware file is not present and the directory content of
> > > /usr/lib/firmware has been listed (e.g. through loading of a different
> > > firmware from there; by the nouveau driver in my case) the firmware load
> > > fails but does not freeze the system
> > 
> > Can you clarify if you mean then that in this case the dvb device is not
> > used?
> > 
> > Which driver firmware load fails?
> 
> With "not used" I mean that the device has been recognized, the si2168
> module is loaded and si2168_probe() has been called on the device, but I
> have not started a media player which actually plays a DVB-C stream.
> Therefore si2168_init() has never been called and the firmware has never
> been requested or loaded to the device.
> 
> On resume si2168_init() will be called (although I don't think this actually
> is really necessary)

Indeed, that seems odd given its not on probe. So yet another possible
si2168 bug. Or another way to put it: your cache calls are not needed
if you remove that si2168_init() if init was not called yet. So simply
extending the data structure for the driver and seting a bool flag to
true if init was called should do the trick.

Then the two cache calls would not be needed.

> and the firmware load of si2168 fails if the firmware
> files are not installed.

OK well that is expected.

> If the kernel does not already know that the files
> are not present without access to the file system, the system just freezes.

It is not clear to me what this means. Can you clarify?

> > >     c) manually reading the firmware files before suspend can make the resume
> > > succeed, but this does not seem to be reliable
> > 
> > That may because of the dentry cache, and so helping races, doing
> > something similar as to what the firmware cache does, but in hacky way.
> > 
> > >     d) modifying the si2168 by adding calls to firmware_request_cache() in
> > > si2168_probe() makes the firmware loading during resume succeed.
> > 
> > That indeed seems like a good idea in general for dvb that should
> > probably be generalized, if dvb had a way to get a dvb's driver
> > firmware name and indeed all dvb drivers are not skipping the firmware
> > cache today.
> > 
> > >     e) having the firmware files not installed freezes the system during
> > > resume if the content of /usr/lib/firmware has not been listed before
> > > suspend (e.g. installing the nvidia driver, so the nouveau driver does not
> > > access this directory)
> > 
> > That may be the same issue as in b) assuming that you meant you didn't
> > use the dvb device, and that the firmware load issue is from nouveau.
> 
> This is actually just the inverted case of b).
> The only real relevance of the nouveau driver here is that its (perfectly
> working) firmware caching on suspend actually seems to be equivalent to
> manually running "ls" on the firmware directory and afterwards the kernel
> also knows whether or not the si2168 firmware files are present without file
> system access.

OK.. I still don't get it, so let me see if we can decipher what you
mean here.

If the firmware is *not* present for the si2168 driver and the device
has *not* been used yet you get a system freeze which you cannot recover
from, but only if you are *not* using a driver which also caches its
firmware already?

In this case if the nouveau is used this freeze does not happen, and you
believe this is because of at least one fetch for the directory is done
already. If so, then this speedup of a fugure negative dentry lookup on
btrfs seems like an interesting test case for btrfs developers too look
at.

But we must first undersand the case well.

> > Are all firmware files used by nouveau upon resume the samea as during
> > probe? If not using firmware_request_cache() for each of them would be
> > a good idea to resolve possible races issues.
> 
> I have not compared them one-by-one, but as I have already written, nouveau
> seems fine here.

OK cool. Thanks for ruling this out!

> > > I'm not sure who really is to blame here:
> > > - BTRFS (ext4 is fine after all)
> > > - the firmware loader because the implementation or the documentation are
> > > wrong
> > > - si2168 because of not loading the firmware or calling
> > > firmware_request_cache() before suspend. Also I don't think it is even
> > > necessary to load the firmware during resume, it should be sufficient to
> > > load the firmware when the tuner is used. I'm not sure though whether the
> > > dvb-frontend driver structure allows to properly distinguish between
> > > resuming and initialization before device usage. The problem can definitely
> > > be worked around here until the root cause is fixed. I can provide a patch
> > > if this solution is seen as appropriate.
> > > 
> > > I'm putting all the maintainers and/or lists of the possible culprits on CC.
> > 
> > Indeed send the patch to use firmware_request_cache() for si2168.
> 
> I sent it to the media mailing list (split into two patches).

Looks good to me, except now that you metion the resuem thing, I think
you can drop those patches if you do the check I mentioned on resume,
ie, to verify that init hasn't been called yet.

Also, might as well rename that function to make it clearer what it does
as init prefixed or postfixed calls tend to be used for things during
initialization.

> > The generic filesystem races on suspend/resume are known, and I will
> > tackle that once I am done with some other task I am completing.
> 
> Great to know that the underlying cause for the freeze is known and a
> solution is being worked on (or at least planned to be worked on).

There is a sliver possibility here that a negative dentry slowdown may
cause a btrfs freeze on suspend easily. If so that can likely be fixed
before that big general filesystem freeze issue I noted with URLs and
talks. But I think we need to confirm that is what is happening here.

  Luis



[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux