On 04/10/2014 10:36 PM, James Bottomley wrote: > On Thu, 2014-04-10 at 19:52 +0200, Hannes Reinecke wrote: >> On 04/10/2014 05:31 PM, Alan Stern wrote: >>> On Thu, 10 Apr 2014, Hannes Reinecke wrote: >>> >>>> On 04/10/2014 12:58 PM, Andreas Reis wrote: >>>>> That patch appears to work in preventing the crashes, judged on one >>>>> repeated appearance of the bug. >>>>> >>>>> dmesg had the usual >>>>> [ 215.229903] usb 4-2: usb_disable_lpm called, do nothing >>>>> [ 215.336941] usb 4-2: reset SuperSpeed USB device number 3 using >>>>> xhci_hcd >>>>> [ 215.350296] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called >>>>> with disabled ep ffff880427b829c0 >>>>> [ 215.350305] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called >>>>> with disabled ep ffff880427b82a08 >>>>> [ 215.350621] usb 4-2: usb_enable_lpm called, do nothing >>>>> >>>>> repeated five times, followed by one >>>>> [ 282.795801] sd 8:0:0:0: Device offlined - not ready after error >>>>> recovery >>>>> >>>>> and then as often as something tried to read from it: >>>>> [ 295.585472] sd 8:0:0:0: rejecting I/O to offline device >>>>> >>>>> The stick could then be properly un- and remounted (the latter if it >>>>> had been physically replugged) without issue � for the bug to >>>>> reoccur after one to three minutes. I tried this three times, no >>>>> dmesg difference except the ep addresses varied on two of that. >>>>> >>>> Was this just that patch you've tested with or the entire patch series? >>>> >>>> If the latter, Alan, is this the expected outcome? >>> >>> Yes, it is. The same thing should happen with the entire patch series. >>> >>>> I would've thought the error recover should _not_ run into >>>> offlining devices here, but rather the device should be recovered >>>> eventually. >>> >>> The command times out, it is aborted, and the command is retried. The >>> same thing happens, and we repeat five times. Eventually the SCSI core >>> gives up and declares the device to be offline. >>> >> Hmm. Ok. If you are fine with it who am I to argue here. >> James, shall I resent the patch series? > > You mean the one patch? No, it's OK, I have it. > > It's still not complete, though, as I've said a couple of times. The > problem is that we have abort memory on any eh command as well, which > this doesn't fix. > > The scenario is abort command, set flag, abort completes, send TUR, TUR > doesn't return, so we now try to abort the TUR, but scsi_abort_eh_cmnd() > will skip the abort because the flag is set and move straight to reset. > > The fix is this, I can just add it as well. > > James > > --- > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index 771c16b..7516e2c 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -920,6 +920,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses, > ses->prot_op = scmd->prot_op; > > scmd->prot_op = SCSI_PROT_NORMAL; > + scmd->eh_eflags = 0; > scmd->cmnd = ses->eh_cmnd; > memset(scmd->cmnd, 0, BLK_MAX_CDB); > memset(&scmd->sdb, 0, sizeof(scmd->sdb)); > > Oh yes, that is correct. Acked-by: Hannes Reinecke <hare@xxxxxxx> Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html