On Thu, May 20, 2021 at 08:30:01PM +0000, Thinh Nguyen wrote: > Hm... looks like we have a couple of issues in the uas storage class > driver and the xhci driver. > > We may need to fix that in the uas storage driver because it doesn't > seem to handle it. (check uas_data_cmplt() in uas.c). Hmmm. I see that if there is an error, the driver assumes no data was transferred. It shouldn't make that assumption; it should always use urb->actual_length. It also doesn't indicate to the SCSI layer when a command gets an error and it doesn't try to do any recovery. Of course, the SCSI error handler may initiate some recovery actions. > As for the xhci driver, there maybe a case where the stream URB never > gets to complete because the transaction err_count is not properly > updated. The err_count for transaction error is stored in ep_ring, but > the xhci driver may not be able to lookup the correct ep_ring based on > TRB address for streams. There are cases for streams where the event > TRBs have their TRB pointer field cleared to '0' (xhci spec section > 4.12.2). If the xhci driver doesn't see ep_ring for transaction error, > it automatically does a soft-retry. This is seen from one of our > testings that the driver was repeatedly doing soft-retry until the class > driver timed out. > > Hi Mathias, maybe you have some comment on this? Thanks. > > > > > The fact is that only a small percentage of -EPROTO errors are > > recoverable. Some of them can be handled by a port reset, which can be > > pretty awkward to perform but does occasionally work. A lot of them > > occur because the USB cable has been unplugged; obviously there's no way > > to recover from that. With only a few exceptions, the best and simplest > > approach is not to try to recover at all. > > If the cable is unplugged, then we should get a connection change event > and the driver can handle it properly. Yes -- unless the driver is in such a tight retry loop that the rest of the system never gets a chance to process the connection change event. I've seen bug reports where that happened. > Yes, it's probably simplest to do a port reset and let the transfer be > incomplete/corrupted. However, I think we should give > ClearFeature(ep_halt) some more thoughts as I think it can be a recovery > mechanism for storage class driver, even though that it may not be > foolproof. When you say storage class driver, which one are you talking about, usb-storage or uas? usb-storage already has a pretty robust recovery mechanism. > > For the case in question (the syzbot bug report that started this > > thread), the class driver doesn't try to perform any recovery. It just > > resubmits the URB, getting into a tight retry loop which consumes too > > much CPU time. Simply giving up would be preferable. > > > > Alan Stern > > > > I see. By giving up, you mean doing port reset right? Otherwise it needs > some other mechanism to synchronize with the device side. No, I mean the driver should just stop communicating with the device. That's an appropriate action for lots of drivers. If the user wants to re-synchronize with the device, he can unplug the USB cable and plug it back in again. Alan Stern