On 10/21/23 06:23, Phillip Susi wrote: > Damien Le Moal <dlemoal@xxxxxxxxxx> writes: > >> On my system, I see: >> >> cat /sys/class/ata_port/ata1/power/runtime_active_kids >> 0 > > I see a 1 there, which is the single scsi_host. The scsi_host has 2 > active kids; the two disks. When I enabled runtime pm, only when the > second disk was suspended did that allow the scsi_host to suspend, which > then allowed the port to suspend. Everything looked fine there so far. > Then I tried: > > echo 1 > /sys/block/sdf/device/delete > > And the SCSI EH appears to have tried to wake up the disk, and hung in > the process. > > [ 314.246282] sd 7:0:0:0: [sde] Synchronizing SCSI cache > [ 314.246445] sd 7:0:0:0: [sde] Stopping disk > > First disk suspends. > > [ 388.518295] sd 7:1:0:0: [sdf] Synchronizing SCSI cache > [ 388.518519] sd 7:1:0:0: [sdf] Stopping disk > > Second disk suspends some time later. > > [ 388.930428] ata8.00: Entering standby power mode > [ 389.330651] ata8.01: Entering standby power mode > > That allowed the port to suspend. This is when I tried to detach the > disk driver, which I think tried to resume the disk before detaching, > which resumed the port. > > [ 467.511878] ata8.15: SATA link down (SStatus 0 SControl 310) > [ 468.142726] ata8.15: failed to read PMP GSCR[0] (Emask=0x100) > [ 468.142741] ata8.15: PMP revalidation failed (errno=-5) > > I ran hdparm -C on the other disk at this point. I just noticed that > the ata8.15 that represents the PMP itself was NOT suspended along with > the two drive links, and then maybe was not resumed before trying to > revalidate the PMP? And that's why it failed? > > [ 473.172792] ata8.15: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > [ 473.486860] ata8.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > [ 473.802139] ata8.01: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > > It seems like it ended up recovering here though? And yet the scsi_eh > remained hung, as did the hdparm -C: > > [ 605.566814] INFO: task scsi_eh_7:173 blocked for more than 120 seconds. > [ 605.566829] Not tainted 6.6.0-rc5+ #5 > [ 605.566834] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 605.566838] task:scsi_eh_7 state:D stack:0 pid:173 ppid:2 flags:0x00004000 > [ 605.566850] Call Trace: > [ 605.566853] <TASK> > [ 605.566860] __schedule+0x37c/0xb70 > [ 605.566878] schedule+0x61/0xd0 > [ 605.566888] rpm_resume+0x156/0x760 Looks like a deadlock somewhere, likely with the device remove that you triggered with the "echo 1 > /sys/block/sdf/device/delete". Can you send the exact list of commands & events you executed to get to that point ? Also please share your kernel config. -- Damien Le Moal Western Digital Research