On Wed, May 03, 2023 at 12:55:03PM +0200, Oliver Neukum wrote: > On 03.05.23 12:24, Benjamin Block wrote: > > On Wed, Apr 26, 2023 at 03:20:07PM -0400, Alan Stern wrote: > > > From a cursory look at the logs above, SCSI ML does just try that: > > > >>> [ 218.089304] sd 0:0:0:0: [sda] tag#0 abort scheduled > >>> [ 218.109297] sd 0:0:0:0: [sda] tag#0 aborting command > > > > calls `hostt->eh_abort_handler()` on the late request, and retries it > > after success. > > > >>> [ 218.359964] sd 0:0:0:0: [sda] tag#0 retry aborted command > >>> [ 225.129297] sd 0:0:0:0: [sda] tag#0 previous abort failed > > > > but it times out again, then we go straight into EH: > > And that is problematic to usb-storage > > > >>> [ 225.129337] scsi host0: Waking error handler thread > >>> [ 225.129358] scsi host0: scsi_eh_0: waking up 0/1/1 > >>> [ 225.129375] scsi host0: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 1 > >>> [ 225.129387] scsi host0: Total of 1 commands on 1 devices require eh work > >>> [ 225.129402] sd 0:0:0:0: scsi_eh_0: Sending BDR > > > > IIRC in the past we used to call Abort a second time from within the EH > > thread before trying the different resets, but that was removed at some > > point a couple of years ago. Seems like I misremembered. Can't find the commit I was thinking happened, and the only thing that changed was that aborts moved out of the EH thread and be asynchronous. > > Now we add the command straight to the EH > > list, and start with the TMF LUN reset, which ought to implicitly abort > > the command as well on the target. > > usb-storage can do a reset only on the USB device level, > which translates to a bus reset on the SCSI level. > > And we are supposed to cancel any communication with the device > before that. Is that a limitation of the devices or drivers? Because then you don't match SCSI semantics for LU reset - which aborts all running commands on that scope among things. Which might explain the reason/choice behind this unexpected behavior for you. On random thought I had: in theory you could implement your own EH strategy handler if the default one doesn't work for you. ATA and SAS do so. [drivers/scsi/scsi_error.c:2285 `shost->transportt->eh_strategy_handler()`] This can re-use parts/all of the existing escalation sequence in `scsi_eh_ready_devs()`. But that's no short-term fix. -- Best Regards, Benjamin Block / Linux on IBM Z Kernel Development IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy Vors. Aufs.-R.: Gregor Pillen / Geschäftsführung: David Faller Sitz der Ges.: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294