On 8/6/23 11:26, Linus Torvalds wrote: > On Sat, 5 Aug 2023 at 18:29, Damien Le Moal <dlemoal@xxxxxxxxxx> wrote: >> >> - Prevent the scsi disk driver from issuing a START STOP UNIT command >> for ATA devices during system resume as this causes various issues >> reported by multiple users. > > Honestly, this seems pretty broken. > > I mean, there are literally just three things that manage_start_stop > does, and you just disabled one of those cases. > > And it's quite illogical, since one of the two remaining cases is for > the suspend side (the final one is shutdown). > > The deeper issue seems to be that ATA is just doing something that > other SCSI devices aren't doing (the whole set > 'sdev->manage_start_stop' thing), and while there is one other user in > the sbp2 firewire code, clearly the SCSI people don't really care for > this bit, and the SCSI changes then cause problems. > > I agree that the new "don't ask sd.c to send a start command" is the > right thing to do, but I feel it was done in a particularly ugly and > illogical manner. > > I think it would have been better if maybe the approach would have been to > > (a) split manage_start_stop into three bits for the three actual > cases it deals with: (stop_on_suspend, stop_on_shutdown, > start_on_resume). > > (b) then not set the "start_on_resume" bit > > instead of the current crazy model of first saying "please manage > start/stop for me" and then following up with "oh, but don't do it for > this case". > > See what I'm saying? Don't mix "please do X, but don't do subset Y" > bits. It's a completely messy thing and makes it really hard to figure > out what you actually want for no good reason. > > I've pulled this, but it really smells like a maintenance disaster to > me. Particularly since the SCSI people really don't care about ATA > anyway. I fully agree with all you said. ATA suspend/resume was badly impacted by the switch to asynchronous PM operations and the scsi layer using the regular PM ops instead of its own. The problems come from the fact that there is no direct link between the scsi device and ata device (and ata port/host device). As a result, the asynchronous PM ops ordering (parent then child for suspend and reverse for resume) does not happen and mess things up. This rather ugly "fix" is the smallest I could think of to quickly address the multiple regressions that were signaled by users. This definitely needs a proper cleanup on top. Your suggestion above makes sense and is definitely needed as the scsi START STOP UNIT command on resume is really useless. There are also some strange things happening with stopping on shutdown that I have noticed as well (e.g. cache flush failing), which I now suspect happen for the same reason as this resume issue: PM operation ordering. So I believe that a proper fix must also include creating correct device links between ATA and scsi devices so that PM ordering works again. This is at the top of my to-do list. Working on it. > > Linus -- Damien Le Moal Western Digital Research