On Wed, May 05, 2021 at 10:22:24PM +0000, Guido Kiener wrote: > > -----Original Message----- > > From: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> > > Sent: Tuesday, May 4, 2021 5:14 PM > > To: Kiener Guido 14DS1 > > Subject: Re: Re: [syzbot] INFO: rcu detected stall in tx > > > > On Mon, May 03, 2021 at 09:56:05PM +0000, Guido Kiener wrote: > > > Hi all, > > > > > > Dave and I discussed the "self-detected stall on CPU" caused by the usbtmc > > driver. > > > > > > What happened? > > > The callback handler usbtmc_interrupt(struct urb *urb) for the INT pipe receives > > an erroneous urb with status -EPROTO (-71). > > > See > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > > > e/drivers/usb/class/usbtmc.c?h=v5.12#n2340 > > > -EPROTO does not abort/shutdown the pipe and the urb is resubmitted to receive > > the next packet. However the callback handler usbtmc_interrupt is called again with > > the same erroneous status -EPROTO and this seems to result in an endless loop. > > > According to > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > > > e/Documentation/driver-api/usb/error-codes.rst?h=v5.12#n177 > > > the error -EPROTO indicates a hardware problem or a bad cable. > > > > > > Most usb drivers do not react in a specific way on this hardware problems and > > resubmit the urb. We assume these drivers will run into the same endless loop. > > Some other driver samples are: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > > > e/drivers/usb/class/cdc-acm.c?h=v5.12#n379 > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre > > > e/drivers/hid/usbhid/usbmouse.c?h=v5.12#n65 > > > > > > Possible solutions: > > > Hardware defects or bad cables seems to be a common problem for most usb > > drivers and I assume we do not want to fix this problem in all class specific drivers, > > but in lower level host drivers, e.g: > > > 1. Using a counter and close the pipe after some detected errors 2. > > > Delay the resubmission of the urb to avoid high cpu usage 3. Do > > > nothing, since it is just a rare problem. > > > > > > We've never seen this problem in our products and we do not dare to change > > anything. > > > > Drivers are not consistent in the way they handle these errors, as you have seen. A > > few try to take active measures, such as retrys with increasing timeouts. Many > > drivers just ignore them, which is not a very good idea. > > > > The general feeling among kernel USB developers is that a -EPROTO, -EILSEQ, or > > -ETIME error should be regarded as fatal, much the same as an unplug event. The > > driver should avoid resubmitting URBs and just wait to be unbound from the device. > > Thanks for your assessment. I agree with the general feeling. I counted about hundred > specific usb drivers, so wouldn't it be better to fix the problem in some of the host drivers (e.g. urb.c)? > We could return an error when calling usb_submit_urb() on an erroneous pipe. > I cannot estimate the side effects and we need to check all drivers again how they deal with the > error situation. Maybe there are some special driver that need a specialized error handling. > In this case these drivers could reset the (new?) error flag to allow calling usb_submit_urb() > again without error. This could work, isn't it? That is feasible, although it would be an awkward approach. As you said, the side effects aren't clear. But it might work. > > If you would like to audit drivers and fix them up to behave this way, that would be > > great. > > Currently not. I cannot pull the USB cable in home office :-), but I will keep an eye on it. > When I'm more involved in the next USB driver issue than I will test bad cables and > maybe get more ideas how we could test and fix this rare error. Will you be able to test patches? Alan Stern