On Sat, Nov 24, 2018 at 02:17:40AM +0200, Aaro Koskinen wrote: > Hi, > > On Fri, Nov 23, 2018 at 01:45:46PM +0200, Peter Ujfalusi wrote: > > On 23/11/2018 0.01, Aaro Koskinen wrote: > > > With that reverted, the DMA works OK (and I can also now confirm that > > > OMAP_DMA_LCH_2D works). I haven't yet checked if we actually need that > > > quirk in OMAP UDC, > > > > The omap_udc driver is a bit of a mess, need to check it myself, but for > > now we can just set the quirk_ep_out_aligned_size and investigate later. > > OK, with quirk_ep_out_aligned_size we get 770/16xx DMA working again, > but on 15xx the omap_udc DMA still doesn't work (tested today for the > first time ever, I have no idea if it has ever worked and if so, when?). Hmm, there's more questionable stuff in this driver, and the gadget layer. Fundamental fact of struct device - it's a ref-counted structure and will only be freed when the last reference is dropped. dev_unregister() merely drops the refcount, it doesn't guarantee that it's dropped to zero (iow, there can be other users). Only when the refcount drops to zero is the dev.release function called. However: usb_add_gadget_udc_release(..., release) { if (release) gadget->dev.release = release; else gadget->dev.release = usb_udc_nop_release; device_initialize(&gadget->dev); ret = device_add(&gadget->dev); } At this point, that struct device is registered, so its refcount can be increased by other users. void usb_del_gadget_udc(struct usb_gadget *gadget) { ... device_unregister(&gadget->dev); memset(&gadget->dev, 0x00, sizeof(gadget->dev)); } That memset() is down-right wrong - the refcount on this struct device may _not_ be zero at this point, the struct device could well be in use by another thread. That memset will trample over the contents of the structure potentially while someone else is using it, and _potentially_ before the gadget->dev.release function has been called. However, that _may_ be a good thing when you read the omap_udc code: status = usb_add_gadget_udc_release(&pdev->dev, &udc->gadget, omap_udc_release); During the omap_udc_remove() function: { ... usb_del_gadget_udc(&udc->gadget); if (udc->driver) return -EBUSY; udc->done = &done; ... more dereferences of udc, which is a _global_ variable ... wait_for_completion(&done); } Now, omap_udc_release() does this: complete(udc->done); kfree(udc); udc = NULL; So, when usb_del_gadget_udc() is called, if device_unregister() within there drops the last reference count, omap_udc_release() will be called immediately. Since udc->done hasn't been setup at that point, that complete() will fail with a NULL pointer dereference. If that doesn't happen, then the kfree() and following set of the global 'udc' variable to NULL means that all future references to 'udc' after the call to usb_del_gadget_udc() in omap_udc_remove() will be dereferencing a NULL pointer. So one way or the other, this leads to a kernel OOPS. If, on the other hand, omap_udc_release() was not called in device_unregister(), the function pointer will be zeroed by the memset(), which will lead to 'udc' never being freed - in other words, we leak memory. What's more is that 'done' is never "completed" so we end up hanging at the wait_for_completion(). Then there's the pointless: if (udc->driver) return -EBUSY; in omap_udc_remove(). The effect of returning an error is... what exactly? It doesn't prevent the device being removed at all, it doesn't delay it, in fact the whole "remove returns an int" is nothing but confusion - the return value from all driver remove methods is completely ignored. If udc->driver is still set at this point, it basically means that we skip the rest of the tear down, but the platform device will still be unbound from the driver, leaving (eg) the transceiver phy still claimed, the procfs file still registered, the interrupts still claimed, the memory region still registered, etc. If omap_udc is built as a module, the module could even be removed while all that is still registered. So, whatever way I look at this, the code in the removal path both in omap_udc and the gadget removal code higher up looks very wrong and broken to me. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up