On 29.07.20 16:53, James Bottomley wrote: > On Wed, 2020-07-29 at 07:46 -0700, James Bottomley wrote: >> On Wed, 2020-07-29 at 10:32 -0400, Alan Stern wrote: >>> On Wed, Jul 29, 2020 at 04:12:22PM +0200, Martin Kepplinger wrote: >>>> On 28.07.20 22:02, Alan Stern wrote: >>>>> On Tue, Jul 28, 2020 at 09:02:44AM +0200, Martin Kepplinger >>>>> wrote: >>>>>> Hi Alan, >>>>>> >>>>>> Any API cleanup is of course welcome. I just wanted to remind >>>>>> you that the underlying problem: broken block device runtime >>>>>> pm. Your initial proposed fix "almost" did it and mounting >>>>>> works but during file access, it still just looks like a >>>>>> runtime_resume is missing somewhere. >>>>> >>>>> Well, I have tested that proposed fix several times, and on my >>>>> system it's working perfectly. When I stop accessing a drive >>>>> it autosuspends, and when I access it again it gets resumed and >>>>> works -- as you would expect. >>>> >>>> that's weird. when I mount, everything looks good, "sda1". But as >>>> soon as I cd to the mountpoint and do "ls" (on another SD card >>>> "ls" works but actual file reading leads to the exact same >>>> errors), I get: >>>> >>>> [ 77.474632] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: >>>> hostbyte=0x00 driverbyte=0x08 cmd_age=0s >>>> [ 77.474647] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current] >>>> [ 77.474655] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0 >>>> [ 77.474667] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 >>>> 00 60 40 00 00 01 00 >>> >>> This error report comes from the SCSI layer, not the block layer. >> >> That sense code means "NOT READY TO READY CHANGE, MEDIUM MAY HAVE >> CHANGED" so it sounds like it something we should be >> ignoring. Usually this signals a problem, like you changed the >> medium manually (ejected the CD). But in this case you can tell us >> to expect this by setting >> >> sdev->expecting_cc_ua >> >> And we'll retry. I think you need to set this on all resumed >> devices. > > Actually, it's not quite that easy, we filter out this ASC/ASCQ > combination from the check because we should never ignore medium might > have changed events on running devices. We could ignore it if we had a > flag to say the power has been yanked (perhaps an additional sdev flag > you set on resume) but we would still miss the case where you really > had powered off the drive and then changed the media ... if you can > regard this as the user's problem, then we might have a solution. > > James > oh I see what you mean now, thanks for the ellaboration. if I do the following change, things all look normal and runtime pm works. I'm not 100% sure if just setting expecting_cc_ua in resume() is "correct" but that looks like it is what you're talking about: (note that this is of course with the one block layer diff applied that Alan posted a few emails back) --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -554,16 +554,8 @@ int scsi_check_sense(struct scsi_cmnd *scmd) * so that we can deal with it there. */ if (scmd->device->expecting_cc_ua) { - /* - * Because some device does not queue unit - * attentions correctly, we carefully check - * additional sense code and qualifier so as - * not to squash media change unit attention. - */ - if (sshdr.asc != 0x28 || sshdr.ascq != 0x00) { - scmd->device->expecting_cc_ua = 0; - return NEEDS_RETRY; - } + scmd->device->expecting_cc_ua = 0; + return NEEDS_RETRY; } /* * we might also expect a cc/ua if another LUN on the target diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index d90fefffe31b..5ad847fed8b9 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3642,6 +3642,8 @@ static int sd_resume(struct device *dev) if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ return 0; + sdkp->device->expecting_cc_ua = 1; + if (!sdkp->device->manage_start_stop) return 0;