On 15/02/18 02:45, Jakub Kicinski wrote:
On Wed, 14 Feb 2018 13:34:38 +0200, cantabile wrote:
The firmware running on the device sometimes survives a reboot
(firmware_running returns 1). When this happens the driver never calls
request_firmware, which means the kernel's firmware handling code
doesn't know this firmware should be cached before hibernating. Upon
resuming from several hours of hibernation, the firmware is no longer
running on the device, so the driver calls request_firmware. Since the
firmware was never cached, it needs to be loaded from disk, and this is
when the system freezes, somewhere in the xfs driver. Fix this by always
requesting the firmware, whether it's already running on the device or not.
Signed-off-by: John Smith <cantabile.desu@xxxxxxxxx>
Thanks for tracking this down, but this seems like the wrong
direction.
What's your hard drive? Is it some complex configuration which
prevents the block device from coming online after resume?
If it's really because of some peculiarities of XFS the fix should
go there, no driver will be able to load FW on resume...
My hard drive is a SATA SSD, with a regular MS-DOS partition table, with
three primary partitions: one ext4 mounted at /boot, one xfs mounted at
/, one xfs mounted at /home. No encryption, no software RAID, no logical
volumes, etc.
The kernel has a firmware caching mechanism to make sure that no driver
needs to load firmware from disk during the resume process. Because
that's unreliable. See this bit of the documentation: [1]
This is how it works: when the driver's probe callback calls
request_firmware, that function adds the firmware file's name in a
devres in your 'struct device'. When you suspend the system, the kernel
calls fw_pm_notify [2], which goes through all the 'struct device's and
looks for firmware file names previously added by request_firmware, and
loads into memory all the firmware files it finds. When you resume the
system, all calls to request_firmware ought to find the firmware files
already loaded in memory. After resuming, the kernel calls fw_pm_notify
again to release the cached firmware files (with some delay).
This caching mechanism currently doesn't always work for your driver
because your driver doesn't always call request_firmware from the probe
callback. Because the firmware running on the device sometimes survives
a reboot.
You can get some very useful messages if you compile firmware_class.c
with -DDEBUG [3]. This is how I figured out the problem.
[1]
https://www.kernel.org/doc/html/v4.13/driver-api/firmware/request_firmware.html#considerations-for-suspend-and-resume
[2]
https://github.com/torvalds/linux/blob/master/drivers/base/firmware_class.c#L1804
[3] https://www.kernel.org/doc/local/pr_debug.txt