On 11-11-19, 10:50, Logan Gunthorpe wrote: > > > On 2019-11-09 10:35 a.m., Vinod Koul wrote: > > On 22-10-19, 15:46, Logan Gunthorpe wrote: > >> +static irqreturn_t plx_dma_isr(int irq, void *devid) > >> +{ > >> + return IRQ_HANDLED; > > > > ?? > > Yes, sorry this is more of an artifact of how I chose to split the > patches up. The ISR is filled-in in patch 4. lets move this code in all including isr registration in patch 4 then :) > >> + */ > >> + schedule_work(&plxdev->release_work); > >> +} > >> + > >> +static void plx_dma_put(struct plx_dma_dev *plxdev) > >> +{ > >> + kref_put(&plxdev->ref, plx_dma_release); > >> +} > >> + > >> +static int plx_dma_alloc_chan_resources(struct dma_chan *chan) > >> +{ > >> + struct plx_dma_dev *plxdev = chan_to_plx_dma_dev(chan); > >> + > >> + kref_get(&plxdev->ref); > > > > why do you need to do this? > > This has to do with being able to probably unbind while a channel is in > use. If we don't hold a reference to the struct plx_dma_dev between > alloc_chan_resources() and free_chan_resources() then it will panic if a > call back is called after plx_dma_remove(). The way I've done it, once a which callback? > device is removed, subsequent calls to dma_prep_memcpy() will fail (see > ring_active). > > struct plx_dma_dev needs to be alive between plx_dma_probe() and > plx_dma_remove(), and between calls to alloc_chan_resources() and > free_chan_resources(). So we use a reference count to ensure this. and that is why we hold module reference so we don't go away without cleanup > >> +static void plx_dma_release_work(struct work_struct *work) > >> +{ > >> + struct plx_dma_dev *plxdev = container_of(work, struct plx_dma_dev, > >> + release_work); > >> + > >> + dma_async_device_unregister(&plxdev->dma_dev); > >> + put_device(plxdev->dma_dev.dev); > >> + kfree(plxdev); > >> +} > >> + > >> +static void plx_dma_release(struct kref *ref) > >> +{ > >> + struct plx_dma_dev *plxdev = container_of(ref, struct plx_dma_dev, ref); > >> + > >> + /* > >> + * The dmaengine reference counting and locking is a bit of a > >> + * mess so we have to work around it a bit here. We might put > >> + * the reference while the dmaengine holds the dma_list_mutex > >> + * which means we can't call dma_async_device_unregister() directly > >> + * here and it must be delayed. > > > > why is that, i have not heard any complaints about locking, can you > > elaborate on why you need to do this? > > Per the above explanation, we need to call plx_dma_put() in > plx_dma_free_chan_resources(); and plx_dma_release() is when we can call > dma_async_device_unregister() (seeing that's when we know there are no > longer any active channels). > > However, dma_chan_put() (which calls device_free_chan_resources()) holds > the dma_list_mutex and dma_async_device_unregister() tries to take the > dma_list_mutex so, if we call unregister inside free_chan_resources we > would deadlock. yes as we are not expecting someone to unregister in device_free_chan_resources(), that is for freeing up resources. You are expected to unregister in .remove! Can you explain me why unregister cant be done in remove? I think I am still missing some detail for this case. -- ~Vinod