Great to hear that you now succeeded in reproducing the problem.
On 03/04/2021 22:25, Luis Chamberlain wrote:
On Sat, Apr 03, 2021 at 12:24:07PM +0200, Lukas Middendorf wrote:
One further thing I noticed which might be problematic in rare cases:
According to the kernel debug messages, the firmware-loader does not attempt
to cache the firmware during suspend,
Correct, the goal with this test was to purposely *not* call for the
firmware prior to suspend, and instead *race* (do the wrong thing) at
resume. It shouldn't really stall... fail yes, but stall, that seems
fishy.
I understood that.
if the previous call to
request_firmware() has failed (file not present; call made during previous
resume). In my opinion it should attempt to cache the firmware on suspend
even in this case
Yes, that is the *proper* way to do use the firmware API, but we want to
reproduce the stall, and so we have to recreate the issue you reported
by doing something bad.
Should firmware_request_cache() always be called, or only instead of
request_firmware() ? In case request_firmware is called on its own, and
the firmware file is not present, it might still race on the next resume.
request_firmware seems to be meant to include the cache request for the
next suspend, but it apparently *does not* if firmware loading fails due
to a missing file. I think this is something that should either be
changed or properly documented in the API documentation.
Did you also try to create a random test-firmware.bin (I used 1M from
/dev/urandom) instead of an empty /lib/firmware ?
No, right now I want to to just focus on fixing the stall you saw.
This is a second nuance of the stall I saw: The firmware file (in this
case test-firmware.bin) is present but not in cache.
If you run
ls -lR /lib/firmware > /dev/null
before
systemctl suspend
your reproduction steps will very likely not stall.
If you in addition run
dd if=/dev/urandom of=/lib/firmware/test-firmware.bin bs=1K count=64
during the preparation of /lib/firmware (in addition to or instead of
your for loop), then it will always stall (as long as the file content
is not read before suspend).
One stall is happening while the directory content is being listed (to
find that the file is not present), the second stall is happening while
actually trying to read the file. Those two stalls likely have the same
root cause, but it might be different. I just want to make sure you have
covered all cases.
You might be better off just reposting your
patches with the respective Reviewed-by tags and pestering your
maintainer.
I will try to be a little bit more insistent this time. Is "just repost" the
usual way to handle if patches are ignored?
You can repost, v2 just add my reviewed-by. Those patches are indeed
correct as they are calling for the firmware prior to resume. Maybe
clarify that.
OK, will do.
But indeed there is also another issue which you reported which needs to
be fixed, and for that thanks so much for you patience! I'll be looking
into this!
Also thank you for your effort.
Lukas