Re: [PATCH] ata,scsi: do not issue START STOP UNIT on resume

"Vivi, Rodrigo" <rodrigo.vivi@xxxxxxxxx> · Thu, 31 Aug 2023 01:48:31 +0000

On Thu, 2023-08-31 at 09:32 +0900, Damien Le Moal wrote:
> On 8/31/23 07:14, Rodrigo Vivi wrote:
> > On Tue, Aug 29, 2023 at 03:17:38PM +0900, Damien Le Moal wrote:
> > > On 8/26/23 02:09, Rodrigo Vivi wrote:
> > > > > > So, maybe we have some kind of disks/configuration out
> > > > > > there where this
> > > > > > start upon resume is needed? Maybe it is just a matter of
> > > > > > timming to
> > > > > > ensure some firmware underneath is up and back to life?
> > > > > 
> > > > > I do not think so. Suspend will issue a start stop unit
> > > > > command to put the drive
> > > > > to sleep and resume will reset the port (which should wake up
> > > > > the drive) and
> > > > > then issue an IDENTIFY command (which will also wake up the
> > > > > drive) and other
> > > > > read logs etc to rescan the drive.
> > > > > In both cases, if the commands do not complete, we would see
> > > > > errors/timeout and
> > > > > likely port reset/drive gone events. So I think this is
> > > > > likely another subtle
> > > > > race between scsi suspend and ata suspend that is causing a
> > > > > deadlock.
> > > > > 
> > > > > The main issue I think is that there is no direct ancestry
> > > > > between the ata port
> > > > > (device) and scsi device, so the change to scsi async pm ops
> > > > > made a mess of the
> > > > > suspend/resume operations ordering. For suspend, scsi device
> > > > > (child of ata port)
> > > > > should be first, then ata port device (parent). For resume,
> > > > > the reverse order is
> > > > > needed. PM normally ensures that parent/child ordering, but
> > > > > we lack that
> > > > > parent/child relationship. I am working on fixing that but it
> > > > > is very slow
> > > > > progress because I have been so far enable to recreate any of
> > > > > the issues that
> > > > > have been reported. I am patching "blind"...
> > > > 
> > > > I believe your suspicious makes sense. And on these lines, that
> > > > patch you
> > > > attached earlier would fix that. However my initial tries of
> > > > that didn't
> > > > help. I'm going to run more tests and get back to you.
> > > 
> > > Rodrigo,
> > > 
> > > I pushed the resume-v2 branch to libata tree:
> > > 
> > > git@xxxxxxxxxxxxxxxxxxx:pub/scm/linux/kernel/git/dlemoal/libata
> > > (or
> > > https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.gi
> > > t)
> > > 
> > > This branch adds 13 patches on top of 6.5.0 to cleanup libata
> > > suspend/resume and
> > > other device shutdown issues. The first 4 patches are the main
> > > ones to fix
> > > suspend resume. I tested that on 2 different machines with
> > > different drives and
> > > with qemu. All seems fine.
> > > 
> > > Could you try to run this through your CI ? I am very interested
> > > in seeing if it
> > > survives your suspend/resume tests.
> > 
> > well, in the end this didn't affect the CI machinery as I was
> > afraid.
> > it is only in my local DG2.
> > 
> > https://intel-gfx-ci.01.org/tree/intel-xe/bat-
> > all.html?testfilter=suspend
> > (bat-dg2-oem2 one)
> > 
> > I just got these 13 patches and applied to my branch and tested it
> > again
> > and it still *fails* for me.
> 
> That is annoying... But I think the messages give us a hint as to
> what is going
> on. See below.
> 
> > 
> > [   79.648328] [IGT] kms_pipe_crc_basic: finished subtest pipe-A-
> > DP-2, SUCCESS
> > [   79.657353] [IGT] kms_pipe_crc_basic: starting dynamic subtest
> > pipe-B-DP-2
> > [   80.375042] PM: suspend entry (deep)
> > [   80.380799] Filesystems sync: 0.002 seconds
> > [   80.386476] Freezing user space processes
> > [   80.392286] Freezing user space processes completed (elapsed
> > 0.001 seconds)
> > [   80.399294] OOM killer disabled.
> > [   80.402536] Freezing remaining freezable tasks
> > [   80.408335] Freezing remaining freezable tasks completed
> > (elapsed 0.001 seconds)
> > [   80.439372] sd 5:0:0:0: [sdb] Synchronizing SCSI cache
> > [   80.439716] serial 00:01: disabled
> > [   80.448011] sd 4:0:0:0: [sda] Synchronizing SCSI cache
> > [   80.448014] sd 7:0:0:0: [sdc] Synchronizing SCSI cache
> > [   80.453600] ata6.00: Entering standby power mode
> 
> This is sd 5:0:0:0. All good, ordered properly with the
> "Synchronizing SCSI cache".
> 
> > [   80.464217] ata5.00: Entering standby power mode
> 
> Same here for sd 4:0:0:0.
> 
> > [   80.812294] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl
> > 300)
> > [   80.818520] ata8.00: Entering active power mode
> > [   80.842989] ata8.00: configured for UDMA/133
> 
> Arg ! sd 7:0:0:0 is resuming ! But the above "Synchronizing SCSI
> cache" tells
> us that it was suspending and libata EH did not yet put that drive to
> standby...
> 
> > [   80.847660] ata8.00: Entering standby power mode
> 
> ... which happens here. So it looks like libata EH had both the
> suspend and
> resume requests at the same time, which is totally weird.

although it looks weird, it totally matches the 'use case'.
I mean, if I suspend, resume, and wait a bit before suspend and resume
again, it will work 100% of the time.
The issue is really only when another suspend comes right after the
resume, in a loop without any wait.

> 
> > [   81.119426] xe 0000:03:00.0: [drm] GT0: suspended
> > [   81.800508] PM: suspend of devices complete after 1367.829 msecs
> > [   81.806661] PM: start suspend of devices complete after 1390.859
> > msecs
> > [   81.813244] PM: suspend devices took 1.398 seconds
> > [   81.820101] PM: late suspend of devices complete after 2.036
> > msecs
> 
> ...and PM suspend completes here. Resume "starts" now (but clearly it
> started
> earlier already given that sd 7:0:0:0 was reactivated.

that is weird.

> 
> > �[   82.403857] serial 00:01: activated
> > [   82.489612] nvme nvme0: 16/0/0 default/read/poll queues
> > [   82.563318] r8169 0000:07:00.0 enp7s0: Link is Down
> > [   82.581444] xe REG[0x223a8-0x223af]: allow read access
> > [   82.586704] xe REG[0x1c03a8-0x1c03af]: allow read access
> > [   82.592071] xe REG[0x1d03a8-0x1d03af]: allow read access
> > [   82.597423] xe REG[0x1c83a8-0x1c83af]: allow read access
> > [   82.602765] xe REG[0x1d83a8-0x1d83af]: allow read access
> > [   82.608113] xe REG[0x1a3a8-0x1a3af]: allow read access
> > [   82.613281] xe REG[0x1c3a8-0x1c3af]: allow read access
> > [   82.618454] xe REG[0x1e3a8-0x1e3af]: allow read access
> > [   82.623634] xe REG[0x263a8-0x263af]: allow read access
> > [   82.628816] xe 0000:03:00.0: [drm] GT0: resumed
> > [   82.728005] ata7: SATA link down (SStatus 4 SControl 300)
> > [   82.733531] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl
> > 300)
> > [   82.739773] ata5.00: Entering active power mode
> > [   82.744398] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl
> > 300)
> > [   82.750618] ata6.00: Entering active power mode
> > [   82.755961] ata5.00: configured for UDMA/133
> > [   82.760479] ata5.00: Enabling discard_zeroes_data
> > [   82.836266] ata6.00: configured for UDMA/133
> > [   84.460081] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl
> > 300)
> > [   84.466354] ata8.00: Entering active power mode
> > [   84.497256] ata8.00: configured for UDMA/133
> > ...
> 
> And this looks all normal, the drives have all been transitioned to
> active
> power mode as expected. And yet, your system is stuck after this,
> right ?

yes

> Can you try to boot with "sysrq_always_enabled" and try to see if
> sending
> "ctrl-sysrq-t" keys can give you a stack backtrace of the tasks to
> see where
> they are stuck ?

I will try tomorrow.

> 
> I am going to try something like you do with very short resume rtc
> timer and
> multiple disks to see if I can reproduce. But it is starting to look
> like PM is
> starting resuming before suspend completes...

yes, this is what it looks like.

>  Not sure that is correct. But the
> end result may be that libata EH endup getting stuck. Let me dig
> further.

Thank you so much for the hard work on this!