On 2020/05/30 5:41, Andrey Konovalov wrote: > On Thu, May 28, 2020 at 10:58 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: >> >> On Thu, May 28, 2020 at 09:51:35PM +0200, Andrey Konovalov wrote: >>> On Thu, May 28, 2020 at 9:40 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: >>>> >>>> On Thu, May 28, 2020 at 09:03:43PM +0200, Andrey Konovalov wrote: >>>> >>>>> Ah, so the problem is that when a process exits, it tries to close wdm >>>>> fd first, which ends up calling wdm_flush(), which can't finish >>>>> because the USB requests are not terminated before raw-gadget fd is >>>>> closed, which is supposed to happen after wdm fd is closed. Is this >>>>> correct? I wonder what will happen if a real device stays connected >>>>> and ignores wdm requests. >>>>> >>>>> I don't understand though, how using wait_event_interruptible() will >>>>> shadow anything here. >>>>> >>>>> Alan, Greg, is this acceptable behavior for a USB driver? >>>> >>>> I don't understand what the problem is. Can you explain in more general >>>> terms -- nothing specific to wdm or anything like that -- what you are >>>> concerned about? Is this something that could happen to any gadget >>>> driver? Or any USB class device driver? Or does it only affect >>>> usespace components of raw-gadget drivers? >>> >>> So, AFAIU, we have a driver whose flush() callback blocks on >>> wait_event(), which can only terminate when either 1) the driver >>> receives a particular USB response from the device or 2) the device >>> disconnects. >> >> This sounds like a bug in the driver. What would it do if someone had a >> genuine (not emulated) but buggy USB device which didn't send the >> desired response? The only way to unblock the driver would be to unplug >> the device! That isn't acceptable behavior. > > OK, that's what I thought. I believe that this is not a bug in the driver but a problem of hardware failure. Unless this is high-availability code which is designed for safely failing over to other node, we don't need to care about hardware failure. > >> >>> For 1) the emulated device doesn't provide required >>> responses. For 2) the problem is that the emulated via raw-gadget >>> device disconnects when the process is killed (and raw-gadget fd is >>> closed). But that process is the same process that is currently stuck >>> on wait_event() in the flush callback(), and therefore unkillable. >> >> What would happen if you unload dummy-hcd at this point? Or even just >> do: echo 0 >/sys/bus/usb/devices/usbN/bConfigurationValue, where N is >> the bus number of the dummy-hcd bus? > > The device disconnects and flush() unblocks. > >>> This can generally happen with any driver that goes into >>> uninterruptible sleep within one of its code paths reachable from >>> userspace that can only be unblocked by a particular behavior from the >>> USB device. But I haven't seen any such drivers so far, wdm is the >>> first. >> >> Drivers should never go into uninterruptible sleep states unless they >> can guarantee that the duration will be bounded somehow (for example, by >> a reasonable timeout). Or that cutting the sleep state short would >> cause the system to crash -- but that's not an issue here. > > OK, thank you, Alan! > > Tetsuo, could you clarify why you think that using > wait_event_interruptible() is a bad fix here? > If wait_event() in wdm_flush() were using timeout or interruptible, can the wdm driver handle that case safely? Since cleanup(desc) from wdm_release() might release "desc", wouldn't "not-waiting-for-completion" has a risk of "use-after-free write"? Please add comment block why it is safe if it is actually safe.