On Fri, May 12, 2017 at 02:49:38PM +0200, Niklas Söderlund wrote: > On 2017-04-07 14:33:47 +0300, Laurent Pinchart wrote: > > Hi Geert, > > > > On Wednesday 05 Apr 2017 12:40:11 Geert Uytterhoeven wrote: > > > On Wed, Apr 5, 2017 at 11:14 AM, Niklas Söderlund wrote: > > > > On 2017-04-05 08:55:31 +0530, Vinod Koul wrote: > > > >> On Thu, Mar 30, 2017 at 09:38:39AM +0200, Niklas Söderlund wrote: > > > >>> On 2017-03-29 15:30:42 +0200, Niklas Söderlund wrote: > > > >>>> On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote: > > > >>>>> On Wed, Mar 29, 2017 at 12:40 AM, Niklas Söderlund wrote: > > > >>>>>> This fixes a race condition where the channel resources could be > > > >>>>>> freed before the ISR had finished running resulting in a NULL > > > >>>>>> pointer reference from the ISR. > > > >>>>>> > > > >>>>>> [ 167.148934] Unable to handle kernel NULL pointer dereference > > > >>>>>> at virtual address 00000000 > > > >>>>>> [ 167.157051] pgd = ffff80003c641000 > > > >>>>>> [ 167.160449] [00000000] *pgd=000000007c507003, > > > >>>>>> *pud=000000007c4ff003, *pmd=0000000000000000 > > > >>>>>> [ 167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP > > > >>>>>> [ 167.174289] Modules linked in: > > > >>>>>> [ 167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted > > > >>>>>> 4.11.0-rc1-00001-g8d92afddc2f6633a #73 > > > >>>>>> [ 167.186131] Hardware name: Renesas Salvator-X board based on > > > >>>>>> r8a7795 (DT) > > > >>>>>> [ 167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000 > > > >>>>>> [ 167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400 > > > >>>>>> [ 167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400 > > > >>>>> > > > >>>>> Do you have a test case to trigger this? > > > >>>> > > > >>>> Yes I have a testcase, it's rather complex and involves both a kernel > > > >>>> module and a userspaces application to stress the rcar-dmac. I'm > > > >>>> checking if I can share this publicly or not, please hold :-) > > > >>> > > > >>> I have now received feedback that I'm unfortunately not allowed to > > > >>> share the test case :-( > > > >>> > > > >>> The big picture in how to trigger this problem is that you start a DMA > > > >>> transfer like this: > > > >>> > > > >>> struct dma_async_tx_descriptor *tx = ...; > > > >>> > > > >>> ... > > > >>> > > > >>> tx->tx_submit(tx); > > > >>> > > > >>> And then you directly call dma_release_channel() on this channel > > > >>> without making sure the completion callback ran or anything. Now if you > > > >>> are unlucky the ISR have not finished running for the DMA when > > > >>> dma_release_channel() starts to clean up resources. The synchronisation > > > >>> point in the dma_release_channel() call path fixes this. > > > >> > > > >> Well the API expectation would be you abort the txn before calling > > > >> release. So the expected order should be: > > > >> > > > >> dmaengine_terminate_all(); > > > >> dma_release_channel(); > > > > > > > > Agree this is the correct way and in this case patch 3/3 in this series > > > > could be dropped. Then device_synchronize() would added to rcar-dmac, > > > > dmaengine_terminate_all() would turn of the IRQ and > > > > dma_release_channel() would ensure that device_synchronize() is called > > > > prior to calling rcar-dmac device_free_chan_resources(). > > > > > > > >> Terminate should then stop the channel, ie abort the pending > > > >> descriptors.. > > > > > > > > However for reasons unknown to me the rcar-dmac > > > > device_free_chan_resources() implementation implements logic to turn of > > > > IRQs before it frees the resources. And it's because of this patch 3/3 > > > > is needed so that it can be sure no ISR is running before it frees > > > > resources. > > > > > > > > I don't know how to best proceed here. I agree it feels a bit odd that > > > > device_free_chan_resources() is dealing with the IRQs as such things > > > > should be done before it's called. But on the other hand that code has > > > > been part of the driver since it was added upstream. I feel a bit > > > > uncomfortable just removing that part from the > > > > device_free_chan_resources() since the driver have been in use with it > > > > for such a long time. > > > > > > > > How would you prefer I try and resolve this? > > > > > > Perhaps Laurent knows why it was implemented this way? > > > > That was nearly 3 years ago, and I can hardly remember reasons related to code > > I wrote 3 months ago :-) > > > > I might just have been overcautious, guarding against conditions that should > > not happen if the caller behaves correctly. The situation might have changed > > since the driver was written. It might also be just a case of cargo-cult > > programming, as the shdma_free_chan_resources() has very similar code. > > Since the driver today have this behavior would it not be best to first > make sure it functions as expected and then as a second step see if we > can remove it all together? > > Vinod would you be strongly opposed to picking up this series as is? If there are no objections then I don't mind picking, please do rebase on -rc1 and resend -- ~Vinod