On 03/04/2021 23:04, Luis Chamberlain wrote:
OK this fixes it but this just shows that likely the thaw'ing allows a race to take place which we didn't expect. I'll do some more digging for a proper fix.
I can indeed confirm that this fixes the stall. This however does not seem to be the (complete) solution. Instead I now get a kernel crash message (see below) for every firmware location tried to read during resume. This might be intentional for debugging purposes during testing. This is identical for ext4 and btrfs. If the firmware file can not be found during caching on suspend, the reads are still attempted again during resume. This leads to multiple of those crash messages (for different firmware locations) during resume if the firmware file is not present, even for drivers properly requesting caching. So if this patch is to go in (those crashes would really help with getting the si2168 patches in…), I think you have to make sure that even for non-existent firmware files, no read is ever attempted on resume. Which means to set up the caching even if the initial request_firmware() failed and to store knowledge about failed caching attempts to not retry these reads on resume. Lukas ------------[ cut here ]------------ WARNING: CPU: 0 PID: 662 at fs/kernel_read_file.c:161 kernel_read_file_from_path_initns+0x11c/0x140 Modules linked in: test_firmware nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink intel_rapl_msr intel_rapl_common kvm_amd vmwgfx ccp kvm ttm drm_kms_helper snd_pcm joydev snd_timer snd e1000 irqbypass cec soundcore vboxguest i2c_piix4 pcspkr drm fuse zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw ata_generic pata_acpi video CPU: 0 PID: 662 Comm: systemd-sleep Not tainted 5.12.0-rc5+ #2 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 RIP: 0010:kernel_read_file_from_path_initns+0x11c/0x140 Code: ff ff 4c 89 e7 89 44 24 10 e8 50 07 fc ff e8 fb 1c d8 ff 44 8b 44 24 10 48 83 c4 28 44 89 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 80 3d b4 10 7a 01 00 75 e3 e9 5b 5e 83 00 e8 cf 1c d8 ff 45 RSP: 0018:ffffa096c0b9fb90 EFLAGS: 00010286 RAX: 00000000fffffff5 RBX: 0000000000000000 RCX: ffffffffbd85d688 RDX: ffffffffbd85d688 RSI: 0000000000000297 RDI: ffffffffbd85d680 RBP: ffffa096c0b9fbe0 R08: 00000000fffffff5 R09: ffffffffbd85d688 R10: ffffffffffffffff R11: 0000000000000000 R12: ffff8976ca811000 R13: ffffa096c0b9fc00 R14: 000000007fffffff R15: 0000000000000001 FS: 00007f4718de3b40(0000) GS:ffff8979cfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005594afeca4f8 CR3: 0000000106a62000 CR4: 00000000000506f0 Call Trace: ? snprintf+0x39/0x40 fw_get_filesystem_firmware+0xe2/0x270 _request_firmware+0x21e/0x500 request_firmware+0x32/0x50 test_firmware_resume.cold+0x4e/0xb2 [test_firmware] ? platform_pm_suspend+0x40/0x40 dpm_run_callback+0x4c/0x120 device_resume+0xa7/0x200 dpm_resume+0xce/0x2c0 dpm_resume_end+0xd/0x20 suspend_devices_and_enter+0x195/0x750 pm_suspend.cold+0x329/0x374 state_store+0x71/0xd0 kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x108/0x180 vfs_write+0x1b8/0x270 ksys_write+0x4f/0xc0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f4719a527a7 Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffee4d18e18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f4719a527a7 RDX: 0000000000000004 RSI: 00007ffee4d18f00 RDI: 0000000000000004 RBP: 00007ffee4d18f00 R08: 00005626b555c710 R09: 00007f4719ae84e0 R10: 00007f4719ae83e0 R11: 0000000000000246 R12: 0000000000000004 R13: 00005626b5558650 R14: 0000000000000004 R15: 00007f4719b25700 ---[ end trace 7f7ef0dc067dd714 ]--- Trying to do direct read when not available test_firmware test_firmware: loading /lib/firmware/updates/5.12.0-rc5+/test-firmware.bin failed with error -11 ------------[ cut here ]------------