On 2019-11-11 11:09 p.m., Vinod Koul wrote: > On 11-11-19, 10:50, Logan Gunthorpe wrote: >> >> >> On 2019-11-09 10:35 a.m., Vinod Koul wrote: >>> On 22-10-19, 15:46, Logan Gunthorpe wrote: >>>> +static irqreturn_t plx_dma_isr(int irq, void *devid) >>>> +{ >>>> + return IRQ_HANDLED; >>> >>> ?? >> >> Yes, sorry this is more of an artifact of how I chose to split the >> patches up. The ISR is filled-in in patch 4. > > lets move this code in all including isr registration in patch 4 then :) Ok, I'll rework that for the next submission. >>>> + */ >>>> + schedule_work(&plxdev->release_work); >>>> +} >>>> + >>>> +static void plx_dma_put(struct plx_dma_dev *plxdev) >>>> +{ >>>> + kref_put(&plxdev->ref, plx_dma_release); >>>> +} >>>> + >>>> +static int plx_dma_alloc_chan_resources(struct dma_chan *chan) >>>> +{ >>>> + struct plx_dma_dev *plxdev = chan_to_plx_dma_dev(chan); >>>> + >>>> + kref_get(&plxdev->ref); >>> >>> why do you need to do this? >> >> This has to do with being able to probably unbind while a channel is in >> use. If we don't hold a reference to the struct plx_dma_dev between >> alloc_chan_resources() and free_chan_resources() then it will panic if a >> call back is called after plx_dma_remove(). The way I've done it, once a > > which callback? Any callback that tries to obtain the free'd plx_dma_dev structure. (So plx_dma_free_chan_resources(), plx_dma_prep_memcpy(), plx_dma_issue_pending(), plx_dma_tx_status()). >> device is removed, subsequent calls to dma_prep_memcpy() will fail (see >> ring_active). >> >> struct plx_dma_dev needs to be alive between plx_dma_probe() and >> plx_dma_remove(), and between calls to alloc_chan_resources() and >> free_chan_resources(). So we use a reference count to ensure this. > > and that is why we hold module reference so we don't go away without > cleanup No, that's wrong. The module reference will prevent the module and the functions within it from going away. It doesn't prevent the driver from being unbound which normally causes the devices' structure from being freed. Most drivers will free the structure containing the DMA engine on the remove() call, so even if the module is still around, its functions will still be called with a freed pointer. We're taking a reference on the pointer to ensure it's not freed while dmaengine users still have a reference to it. >>>> +static void plx_dma_release_work(struct work_struct *work) >>>> +{ >>>> + struct plx_dma_dev *plxdev = container_of(work, struct plx_dma_dev, >>>> + release_work); >>>> + >>>> + dma_async_device_unregister(&plxdev->dma_dev); >>>> + put_device(plxdev->dma_dev.dev); >>>> + kfree(plxdev); >>>> +} >>>> + >>>> +static void plx_dma_release(struct kref *ref) >>>> +{ >>>> + struct plx_dma_dev *plxdev = container_of(ref, struct plx_dma_dev, ref); >>>> + >>>> + /* >>>> + * The dmaengine reference counting and locking is a bit of a >>>> + * mess so we have to work around it a bit here. We might put >>>> + * the reference while the dmaengine holds the dma_list_mutex >>>> + * which means we can't call dma_async_device_unregister() directly >>>> + * here and it must be delayed. >>> >>> why is that, i have not heard any complaints about locking, can you >>> elaborate on why you need to do this? >> >> Per the above explanation, we need to call plx_dma_put() in >> plx_dma_free_chan_resources(); and plx_dma_release() is when we can call >> dma_async_device_unregister() (seeing that's when we know there are no >> longer any active channels). >> >> However, dma_chan_put() (which calls device_free_chan_resources()) holds >> the dma_list_mutex and dma_async_device_unregister() tries to take the >> dma_list_mutex so, if we call unregister inside free_chan_resources we >> would deadlock. > > yes as we are not expecting someone to unregister in > device_free_chan_resources(), that is for freeing up resources. > > You are expected to unregister in .remove! > > Can you explain me why unregister cant be done in remove? I think I am > still missing some detail for this case. Because, if the user unbinds while there's a client of the dma channel, then it panics the kernel. First there's the warning[1] I pointed out previously, then the DMA channel users will cause a use after free exception when they continue unaware that the memory they are using has been freed. For an example from a random driver: 1) owl_dma_probe() allocates it's struct owl_dma with devm_kzalloc() 2) Another driver calls dma_find_channel() and obtains a reference to one of the channels 3) Asynchronously, the user unbinds the owl_dma driver using the sysfs interface 4) owl_dma_remove() is called which calls dma_async_device_unregister() which produces a WARN_ON because a channel is in use 5) The devm stack for this driver instance unwinds and the struct owl_dma is freed 6) The client driver then calls dmaengine_prep_dma_memcpy() which calls owl_dma_prep_memcpy(). The first thing that driver does is convert the now invalid channel reference to an invalid struct owl_dma reference and shortly thereafter dereferences the now freed memory. If KASAN is enabled, the user will get a big use after free bug panic. If not, the driver will read and write memory that may be used by some other random process eventually causing other random fatal errors in the system. The best case scenario is the process that allocates the already freed memory zeros it, and thus the client driver would panic on a NULL pointer dereference. I think this is unacceptable for a driver to have happen and that's why I wrote the plx driver such that it is not possible. This is especially important for the PLX driver because it is a PCI device which can be hotplugged so users may actually be randomly trying to unbind it. Logan [1] https://elixir.bootlin.com/linux/latest/source/drivers/dma/dmaengine.c#L1119