Re: [GIT PULL] ata fixes for 6.5-rc5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/6/23 11:26, Linus Torvalds wrote:
> On Sat, 5 Aug 2023 at 18:29, Damien Le Moal <dlemoal@xxxxxxxxxx> wrote:
>>
>>  - Prevent the scsi disk driver from issuing a START STOP UNIT command
>>    for ATA devices during system resume as this causes various issues
>>    reported by multiple users.
> 
> Honestly, this seems pretty broken.
> 
> I mean, there are literally just three things that manage_start_stop
> does, and you just disabled one of those cases.
> 
> And it's quite illogical, since one of the two remaining cases is for
> the suspend side (the final one is shutdown).
> 
> The deeper issue seems to be that ATA is just doing something that
> other SCSI devices aren't doing (the whole set
> 'sdev->manage_start_stop' thing), and while there is one other user in
> the sbp2 firewire code, clearly the SCSI people don't really care for
> this bit, and the SCSI changes then cause problems.
> 
> I agree that the new "don't ask sd.c to send a start command" is the
> right thing to do, but I feel it was done in a particularly ugly and
> illogical manner.
> 
> I think it would have been better if maybe the approach would have been to
> 
>  (a) split manage_start_stop into three bits for the three actual
> cases it deals with: (stop_on_suspend, stop_on_shutdown,
> start_on_resume).
> 
>  (b) then not set the "start_on_resume" bit
> 
> instead of the current crazy model of first saying "please manage
> start/stop for me" and then following up with "oh, but don't do it for
> this case".
> 
> See what I'm saying? Don't mix "please do X, but don't do subset Y"
> bits. It's a completely messy thing and makes it really hard to figure
> out what you actually want for no good reason.
> 
> I've pulled this, but it really smells like a maintenance disaster to
> me. Particularly since the SCSI people really don't care about ATA
> anyway.

I fully agree with all you said. ATA suspend/resume was badly impacted by the
switch to asynchronous PM operations and the scsi layer using the regular PM ops
instead of its own. The problems come from the fact that there is no direct link
between the scsi device and ata device (and ata port/host device). As a result,
the asynchronous PM ops ordering (parent then child for suspend and reverse for
resume) does not happen and mess things up. This rather ugly "fix" is the
smallest I could think of to quickly address the multiple regressions that were
signaled by users.

This definitely needs a proper cleanup on top. Your suggestion above makes sense
and is definitely needed as the scsi START STOP UNIT command on resume is really
useless. There are also some strange things happening with stopping on shutdown
that I have noticed as well (e.g. cache flush failing), which I now suspect
happen for the same reason as this resume issue: PM operation ordering. So I
believe that a proper fix must also include creating correct device links
between ATA and scsi devices so that PM ordering works again.

This is at the top of my to-do list. Working on it.

> 
>                Linus

-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux