On Sat, May 30, 2020 at 2:43 AM Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > > On 2020/05/30 5:41, Andrey Konovalov wrote: > > On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > >> > >> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote: > >>> On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > >>>> > >>>> On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote: > >>>> > >>>>> Ah, so the problem is that when a process exits, it tries to close wdm > >>>>> fd first, which ends up calling wdm_flush(), which can't finish > >>>>> because the USB requests are not terminated before raw-gadget fd is > >>>>> closed, which is supposed to happen after wdm fd is closed. Is this > >>>>> correct? I wonder what will happen if a real device stays connected > >>>>> and ignores wdm requests. > >>>>> > >>>>> I don't understand though, how using wait_event_interruptible() will > >>>>> shadow anything here. > >>>>> > >>>>> Alan, Greg, is this acceptable behavior for a USB driver? > >>>> > >>>> I don't understand what the problem is. Can you explain in more general > >>>> terms -- nothing specific to wdm or anything like that -- what you are > >>>> concerned about? Is this something that could happen to any gadget > >>>> driver? Or any USB class device driver? Or does it only affect > >>>> usespace components of raw-gadget drivers? > >>> > >>> So, AFAIU, we have a driver whose flush() callback blocks on > >>> wait_event(), which can only terminate when either 1) the driver > >>> receives a particular USB response from the device or 2) the device > >>> disconnects. > >> > >> This sounds like a bug in the driver. What would it do if someone had a > >> genuine (not emulated) but buggy USB device which didn't send the > >> desired response? The only way to unblock the driver would be to unplug > >> the device! That isn't acceptable behavior. > > > > OK, that's what I thought. > > I believe that this is not a bug in the driver but a problem of hardware > failure. Unless this is high-availability code which is designed for safely > failing over to other node, we don't need to care about hardware failure. > > > > >> > >>> For 1) the emulated device doesn't provide required > >>> responses. For 2) the problem is that the emulated via raw-gadget > >>> device disconnects when the process is killed (and raw-gadget fd is > >>> closed). But that process is the same process that is currently stuck > >>> on wait_event() in the flush callback(), and therefore unkillable. > >> > >> What would happen if you unload dummy-hcd at this point? Or even just > >> do: echo 0 >/sys/bus/usb/devices/usbN/bConfigurationValue, where N is > >> the bus number of the dummy-hcd bus? > > > > The device disconnects and flush() unblocks. > > > >>> This can generally happen with any driver that goes into > >>> uninterruptible sleep within one of its code paths reachable from > >>> userspace that can only be unblocked by a particular behavior from the > >>> USB device. But I haven't seen any such drivers so far, wdm is the > >>> first. > >> > >> Drivers should never go into uninterruptible sleep states unless they > >> can guarantee that the duration will be bounded somehow (for example, by > >> a reasonable timeout). Or that cutting the sleep state short would > >> cause the system to crash -- but that's not an issue here. > > > > OK, thank you, Alan! > > > > Tetsuo, could you clarify why you think that using > > wait_event_interruptible() is a bad fix here? > > > > If wait_event() in wdm_flush() were using timeout or interruptible, can the > wdm driver handle that case safely? Since cleanup(desc) from wdm_release() > might release "desc", wouldn't "not-waiting-for-completion" has a risk of > "use-after-free write"? Please add comment block why it is safe if it is > actually safe. Oh, it might be that just replacing wait_event() with wait_event_interruptible() can lead to other issues, and a more involved fix is needed. The suggestion was rather to avoid blocking flush() indefinitely.