On 03/31/2014 05:03 PM, James Bottomley wrote: > [lets split the thread] > On Mon, 2014-03-31 at 16:37 +0200, Hannes Reinecke wrote: >> On 03/31/2014 03:33 PM, Alan Stern wrote: >>> On Mon, 31 Mar 2014, Hannes Reinecke wrote: >>>> On 03/28/2014 08:29 PM, Alan Stern wrote: >>>>> On Fri, 28 Mar 2014, James Bottomley wrote: >>>>> Maybe scmd_eh_abort_handler() should check the flag before doing >>>>> anything. Is there any sort of sychronization to prevent the same >>>>> incarnation of a command from being aborted twice (or by two different >>>>> threads at the same time)? If there is, it isn't obvious. >>>>> >>>> See above. scsi_times_out() will only ever called once. >>>> What can happen, though, is that _theoretically_ the LLDD might >>>> decide to call ->done() on a timed out command when >>>> scsi_eh_abort_handler() is still pending. >>> >>> That's okay. We can expect the LLDD to have sufficient locking to >>> handle that sort of thing without confusion (usb-storage does, for >>> example). >>> >>>>> (Also, what's going on at the start of scsi_abort_command()? Contrary >>>>> to what one might expect, the first part of the function _cancels_ a >>>>> scheduled abort. And it does so without clearing the >>>>> SCSI_EH_ABORT_SCHEDULED flag.) >>>>> >>>> The original idea was this: >>>> >>>> SCSI_EH_ABORT_SCHEDULED is sticky _per command_. >>>> Point is, any command abort is only ever send for a timed-out >>>> command. And the main problem for a timed-out command is that we >>>> simply _do not_ know what happened for that command. So _if_ a >>>> command timed out, _and_ we've send an abort, _and_ the command >>>> times out _again_ we'll be running into an endless loop between >>>> timeout and aborting, and never returning the command at all. >>>> >>>> So to prevent this we should set a marker on that command telling it >>>> to _not_ try to abort the command again. >>> >>> I disagree. We _have_ to abort the command again -- how else can we >>> stop a running command? To prevent the loop you described, we should >>> avoid _retrying_ the command after it is aborted the second time. >>> >> The actual question is whether it's worth aborting the same command >> a second time. >> In principle any reset (like LUN reset etc) should clear the >> command, too. >> And the EH abort functionality is geared around this. >> If, for some reason, the transport layer / device driver >> requires a command abort to be send then sure, we need >> to accommodate for that. > > We already discussed this (that was my first response too). USB needs > all outstanding commands aborted before proceeding to reset. I'm > starting to think the actual way to fix this is to reset the abort > scheduled only if we send something else, so this might be a better fix. > > It doesn't matter if we finish a command with abort scheduled because > the command then gets freed and the flags cleaned. > > We can take our time with this because the other two patches, which I > can send separately fix the current deadlock (we no longer send an > unaborted request sense before the reset), and it can go via rc fixes. > Yes, agreed. The USB case seems to be a bit more tricky, and at least I need some more time to fully understand the details and implications of this. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html