On Tue, Dec 06, 2011 at 04:20:56PM -0500, Alan Stern wrote: > On Tue, 6 Dec 2011, Sarah Sharp wrote: > > > If we can use the SCSI lock to safely set the resetting flag, and stop > > uas_queuecommand_lck from submitting new commands until the bus reset > > completes, then I think I can see where you're going with the abort > > synchronization. > > > > As you said, we want to abort any commands in flight, which means > > stopping uas_do_work() from resubmitting partially queued URBs, and > > stopping the status URB completion from submitting more USB 2.0 URBs in > > uas_stat_cmplt. Those would be RCU readers, looking at a flag in > > uas_cmd_info, possibly a new "aborted" flag in the state variable. > > > > The RCU writers would be uas_eh_abort_handler and > > uas_pre_reset. I know that you said the pre-reset function should just > > call the abort handler and let it take care of the synchronization. > > However, the abort handler needs the scsi_cmnd pointer, and right now > > there's no list of outstanding commands. The UAS driver was really a > > fire-and-forget driver until I added the URB anchors. So to be able to > > call the abort handler, we'd need a list of outstanding commands, but > > walking that list would race with the status completion function > > that would take the finished commands off the list, and then we open a > > whole new can of worms. > > Here's one way to do it. Have two new flags: one for reset-in-progress > and one for abort-in-progress. If the reset-in-progress flag is set, > then uas_do_work and uas_stat_cmplt would know not to submit any URBs > at all. If that flag isn't set but the other one is, they would know > to check whether the particular command they are working on at the > moment is being aborted. > > Then all you need is some way to tell when all the outstanding commands > have completed. For example, an atomic counter of outstanding > commands. When the counter goes to 0, if the reset-in-progress flag is > set then you signal the pre-reset thread that it may proceed (use a > struct completion or something like that). Ok, I've tried to wrap my head around this strategy all day yesterday, and I can't see how it can be done without races. I feel like I'm going in mental circles, and missing information about how the SCSI layer works. It would be best if you wrote this patch yourself. There's nothing strictly speaking wrong with the original patchset, aside from the race condition between the abort handler and uas_stat_cmplt, is there? You've expressed that you think it's a bit heavy-handed, because it kills the commands on reset, but Oliver has also expressed the opinion that when another driver wants a reset on a composite device, it often means there's something wrong with the other interface. If we find out that it is an improvement to flush rather than kill commands on reset, that patch can always be added later. The patchset as it was submitted is an improvement over the current code, since it adds device error handling where there was none before. The simplest solution (and the one I understand) to make the original patchset race-free is to add an aborting flag to the uas_command_info structure, make the abort handler set it before it takes the work queue lock, and rearrange the function that is called by uas_stat_cmplt such that it takes the uas_work_lock before submitting URBs and checks the flag: static int uas_xfer_data(struct urb *urb, struct scsi_cmnd *cmnd, struct uas_dev_info *devinfo, unsigned direction) { struct uas_cmd_info *cmdinfo = (void *)&cmnd->SCp; int err, srcu_idx; srcu_idx = srcu_read_lock(devinfo->srcu); if (atomic_read(&devinfo->resetting)) { srcu_read_unlock(devinfo->srcu, srcu_idx); return -ENODEV; } cmdinfo->state = direction | SUBMIT_STATUS_URB; spin_lock(&uas_work_lock); /* XXX: check the aborting flag here. */ err = uas_submit_urbs(cmnd, cmnd->device->hostdata, GFP_ATOMIC); if (err) { list_add_tail(&cmdinfo->list, &uas_work_list); spin_unlock(&uas_work_lock); schedule_work(&uas_work); } else { spin_unlock(&uas_work_lock); } srcu_read_unlock(devinfo->srcu, srcu_idx); return 0; } That does mean that the USB 2.0 UAS fast path will have to take a lock on every submit, which is not ideal, but usb-storage also takes a mutex, so it's a wash. I'm not convinced the USB 2.0 UAS device will get much better performance than BoT anyway. If performance testing shows that the spinlock is the bottleneck, I can re-look at the solution. The good news is that the USB 3.0 UAS fast path does not include uas_xfer_data, so it will only see the RCU readlock. USB 3.0 UAS is where I expect the performance of the code to matter the most. So is there anything horrendously objectionable about closing the abort/resubmit race condition with a spinlock and using the RCU implementation from the original patch? Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html