> On Wed, Jul 21, 2021 at 05:08:48PM +0000, Guido Kiener wrote: > > > > > Subject: *EXT* Re: [PATCH] USB: usbtmc: Fix RCU stall warning > > > > > > > > > > On Wed, Jul 21, 2021 at 11:44:23AM +0200, dave penkler wrote: > > > > > > Sorry, the issue this patch is trying to fix occurs because > > > > > > the current usbtmc driver resubmits the URB when it gets an EPROTO > return. > > > > > > The dummy usb host controller driver used in the syzbot tests > > > > > > keeps returning the resubmitted URB with EPROTO causing a loop > > > > > > that starves RCU. With an actual HCI driver it either recovers > > > > > > or returns an > > > EPIPE. > > > > > > > > > > Are you sure about that? Have you ever observed a usbtmc device > > > > > recovering and continuing to operate after an EPROTO error? > > > > > > > > I can't speak for Dave and his investigations. However as you > > > > remember I did tests with EPROTO errors, see thread: > > > > https://marc.info/?l=linux-usb&m=162163776930423&w=2 > > > > In the thread you can see the recovering. > > > > > > Ah yes, now I remember. > > > > > > That message doesn't show the _device_ recovering and continuing to > > > operate, though. It shows the _system_ recovering and realizing > > > that the device has been disconnected. > > > > > > What I was asking about was whether you knew of a case where there > > > was an EPROTO error but afterward the usbtmc device continued to > > > work -- no disconnection. Assuming such cases are vanishingly rare, > > > there's no harm in having the driver give up whenever it encounters EPROTO. > > > > I have no idea how often the EPROTO error can happen during normal operation > and believe you that it's vanishingly rare. > > When it happens, does the USB hardware protocol automatically retransmit the > lost/invalid packet? > > When a low-level protocol error occurs, the USB host controller hardware does > automatically retransmit the packet. USB has a "3 strikes and you're out" approach: > The error doesn't get reported until there have been three failed transmission > attempts. > > On the other hand, dummy-hcd never has these errors (for obvious reasons) unless > the function driver has been unbound, which always results in a disconnect. Or if > the host-side driver does something wrong, like trying to communicate with a > nonexistent endpoint. > > > If yes, we should think about an error counter. > > What for? > > Besides, the ehci-hcd driver already has a higher-level retry loop for low-level > protocol errors. It makes at least 32 attempts before giving up on a transaction. > > > If not, then we really can stop the INT pipe and the user will detect that something > is wrong when reading the status. > > Or in the most likely case, the system will realize (after a few hundred > milliseconds) that the device has been disconnected and will clean up. The only > question is whether the usbtmc driver should make multiple futile attempts to restart > the transmission during those milliseconds. I think it shouldn't. Thank you for the clarification. I did not know these low level hardware/driver facts. I fully agree that we do not need an extra error counter here. There is enough error handling. So we can say that the EPROTO error is a fatal error and there is no reasonable cause to resubmit the urb or to try other workarounds. Thanks. Guido