On 2019-11-09 10:35 a.m., Vinod Koul wrote: > On 22-10-19, 15:46, Logan Gunthorpe wrote: >> +static irqreturn_t plx_dma_isr(int irq, void *devid) >> +{ >> + return IRQ_HANDLED; > > ?? Yes, sorry this is more of an artifact of how I chose to split the patches up. The ISR is filled-in in patch 4. >> + */ >> + schedule_work(&plxdev->release_work); >> +} >> + >> +static void plx_dma_put(struct plx_dma_dev *plxdev) >> +{ >> + kref_put(&plxdev->ref, plx_dma_release); >> +} >> + >> +static int plx_dma_alloc_chan_resources(struct dma_chan *chan) >> +{ >> + struct plx_dma_dev *plxdev = chan_to_plx_dma_dev(chan); >> + >> + kref_get(&plxdev->ref); > > why do you need to do this? This has to do with being able to probably unbind while a channel is in use. If we don't hold a reference to the struct plx_dma_dev between alloc_chan_resources() and free_chan_resources() then it will panic if a call back is called after plx_dma_remove(). The way I've done it, once a device is removed, subsequent calls to dma_prep_memcpy() will fail (see ring_active). struct plx_dma_dev needs to be alive between plx_dma_probe() and plx_dma_remove(), and between calls to alloc_chan_resources() and free_chan_resources(). So we use a reference count to ensure this. >> +} >> + >> +static void plx_dma_release_work(struct work_struct *work) >> +{ >> + struct plx_dma_dev *plxdev = container_of(work, struct plx_dma_dev, >> + release_work); >> + >> + dma_async_device_unregister(&plxdev->dma_dev); >> + put_device(plxdev->dma_dev.dev); >> + kfree(plxdev); >> +} >> + >> +static void plx_dma_release(struct kref *ref) >> +{ >> + struct plx_dma_dev *plxdev = container_of(ref, struct plx_dma_dev, ref); >> + >> + /* >> + * The dmaengine reference counting and locking is a bit of a >> + * mess so we have to work around it a bit here. We might put >> + * the reference while the dmaengine holds the dma_list_mutex >> + * which means we can't call dma_async_device_unregister() directly >> + * here and it must be delayed. > > why is that, i have not heard any complaints about locking, can you > elaborate on why you need to do this? Per the above explanation, we need to call plx_dma_put() in plx_dma_free_chan_resources(); and plx_dma_release() is when we can call dma_async_device_unregister() (seeing that's when we know there are no longer any active channels). However, dma_chan_put() (which calls device_free_chan_resources()) holds the dma_list_mutex and dma_async_device_unregister() tries to take the dma_list_mutex so, if we call unregister inside free_chan_resources we would deadlock. I suspect that most dmaengine drivers will WARN[1] and then panic if they are unbound while in use. The locking, reference counting and structure of the API didn't seem to consider it and required extra reference counts and workarounds to make it work correctly at all. This is the mess I'm referring to and would require some significant restructuring to fix generally. Logan [1] https://elixir.bootlin.com/linux/latest/source/drivers/dma/dmaengine.c#L1119