On Tue, 3 Oct 2023 14:34:56 +0200 "Rafael J. Wysocki" <rafael@xxxxxxxxxx> wrote: > On Tue, Oct 3, 2023 at 1:02 PM Petr Tesařík <petr@xxxxxxxxxxx> wrote: > > > > On Tue, 3 Oct 2023 12:15:10 +0200 > > "Rafael J. Wysocki" <rafael@xxxxxxxxxx> wrote: > > > > > On Tue, Oct 3, 2023 at 11:31 AM Petr Tesařík <petr@xxxxxxxxxxx> wrote: > > > > > > > > Hi again (adding more recipients), > > > > > > > > On Sat, 30 Sep 2023 12:20:54 +0200 > > > > Petr Tesařík <petr@xxxxxxxxxxx> wrote: > > > > > > > > > Hi all, > > > > > > > > > > this time no patch (yet). In short, my Thinkpad running v6.6-rc3 fails > > > > > to resume from S3. It also fails the same way with Tumbleweed v6.5 > > > > > kernel. I was able to capture a crash dump of the v6.5 kernel, and > > > > > here's my analysis: > > > > > > > > > > The system never gets to waking up my SATA SSD disk: > > > > > > > > > > [0:0:0:0] disk ATA KINGSTON SEDC600 H5.1 /dev/sda > > > > > > > > > > There is a pending resume work for kworker/u32:12 (PID 11032), but this > > > > > worker is stuck in 'D' state: > > > > > > > > > > >>> prog.stack_trace(11032) > > > > > #0 context_switch (../kernel/sched/core.c:5381:2) > > > > > #1 __schedule (../kernel/sched/core.c:6710:8) > > > > > #2 schedule (../kernel/sched/core.c:6786:3) > > > > > #3 schedule_preempt_disabled (../kernel/sched/core.c:6845:2) > > > > > #4 __mutex_lock_common (../kernel/locking/mutex.c:679:3) > > > > > #5 __mutex_lock (../kernel/locking/mutex.c:747:9) > > > > > #6 acpi_device_hotplug (../drivers/acpi/scan.c:382:2) > > > > > #7 acpi_hotplug_work_fn (../drivers/acpi/osl.c:1162:2) > > > > > #8 process_one_work (../kernel/workqueue.c:2600:2) > > > > > #9 worker_thread (../kernel/workqueue.c:2751:4) > > > > > #10 kthread (../kernel/kthread.c:389:9) > > > > > #11 ret_from_fork (../arch/x86/kernel/process.c:145:3) > > > > > #12 ret_from_fork_asm+0x1b/0x20 (../arch/x86/entry/entry_64.S:304) > > > > > > > > > > acpi_device_hotplug() tries to acquire acpi_scan_lock, which is held by > > > > > systemd-sleep (PID 11002). This task is also in 'D' state: > > > > > > > > > > >>> prog.stack_trace(11002) > > > > > #0 context_switch (../kernel/sched/core.c:5381:2) > > > > > #1 __schedule (../kernel/sched/core.c:6710:8) > > > > > #2 schedule (../kernel/sched/core.c:6786:3) > > > > > #3 schedule_preempt_disabled (../kernel/sched/core.c:6845:2) > > > > > #4 __mutex_lock_common (../kernel/locking/mutex.c:679:3) > > > > > #5 __mutex_lock (../kernel/locking/mutex.c:747:9) > > > > > #6 device_lock (../include/linux/device.h:958:2) > > > > > #7 device_complete (../drivers/base/power/main.c:1063:2) > > > > > #8 dpm_complete (../drivers/base/power/main.c:1121:3) > > > > > #9 suspend_devices_and_enter (../kernel/power/suspend.c:516:2) > > > > > > > > I believe the issue must be somewhere here. The whole suspend and > > > > resume logic in suspend_devices_and_enter() is framed by > > > > platform_suspend_begin() and platform_resume_end(). > > > > > > > > My system is an ACPI system, so suspend_ops contains: > > > > > > > > .begin = acpi_suspend_begin, > > > > .end = acpi_pm_end, > > > > > > > > Now, acpi_suspend_begin() acquires acpi_scan_lock through > > > > acpi_pm_start(), and the lock is not released until acpi_pm_end(). > > > > Since dpm_complete() waits for the completion of a work that tries to > > > > acquire acpi_scan_lock, the system will deadlock. > > > > > > So holding acpi_scan_lock across suspend-resume is basically to > > > prevent the hotplug from taking place then IIRC. > > > > > > > AFAICS either: > > > > > > > > a. the ACPI lock cannot be held while dpm_complete() runs, or > > > > b. ata_scsi_dev_rescan() must not be scheduled before the system is > > > > resumed, or > > > > c. acpi_device_hotplug() must be implemented without taking dev->mutex. > > > > > > > > My gut feeling is that b. is the right answer. > > > > > > It's been a while since I looked at that code last time, but then it > > > has not changed for quite some time too. > > > > > > It looks like the acpi_device_hotplug() path attempts to acquire > > > acpi_scan_lock() while holding dev->mutex which is kind of silly. I > > > need to check that, though. > > > > Thanks for your willingness. Well, it's not quite what you describe. If > > it was a simple ABBA deadlock, then it would be reported by lockdep. > > No, it's more complicated: > > > > 1. suspend_devices_and_enter() holds acpi_scan_lock, > > 2. an ACPI hotplug work runs, but acpi_device_hotplug() goes to sleep > > when it gets to acquiring acpi_scan_lock, > > 3. ata_scsi_dev_rescan() submits a SCSI command and waits for its > > completion while holding dev->mutex, > > 4. the SCSI completion work happens to be put on the same workqueue as > > the ACPI hotplug work in step 2, > > ^^^--- THIS is how the two events are serialized! > > Which is unexpected. > > And quite honestly I'm not sure how this can happen, because > acpi_hotplug_schedule() uses a dedicated workqueue and it is called > from (a) the "eject" sysfs attribute (which cannot happen while system > suspend-resume is in progress) and (b) acpi_bus_notify() which has > nothing to do with SCSI. Oh, you're right, and I was too quick. They cannot be on the same queue... > Maybe the workqueue used for the SCSI completion is freezable? Yes, that's it: *(struct workqueue_struct *)0xffff97d240b2fe00 = { /* ... */ .flags = (unsigned int)4, /* WQ_FREEZABLE = 1 << 2 */ Good. But if this workqueue is frozen, the system still cannot make progress. Petr T