Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Geert,

On Wednesday 05 Apr 2017 12:40:11 Geert Uytterhoeven wrote:
> On Wed, Apr 5, 2017 at 11:14 AM, Niklas Söderlund wrote:
> > On 2017-04-05 08:55:31 +0530, Vinod Koul wrote:
> >> On Thu, Mar 30, 2017 at 09:38:39AM +0200, Niklas Söderlund wrote:
> >>> On 2017-03-29 15:30:42 +0200, Niklas Söderlund wrote:
> >>>> On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
> >>>>> On Wed, Mar 29, 2017 at 12:40 AM, Niklas Söderlund wrote:
> >>>>>> This fixes a race condition where the channel resources could be
> >>>>>> freed before the ISR had finished running resulting in a NULL
> >>>>>> pointer reference from the ISR.
> >>>>>> 
> >>>>>> [  167.148934] Unable to handle kernel NULL pointer dereference
> >>>>>> at virtual address 00000000
> >>>>>> [  167.157051] pgd = ffff80003c641000
> >>>>>> [  167.160449] [00000000] *pgd=000000007c507003,
> >>>>>> *pud=000000007c4ff003, *pmd=0000000000000000
> >>>>>> [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> >>>>>> [  167.174289] Modules linked in:
> >>>>>> [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted
> >>>>>> 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> >>>>>> [  167.186131] Hardware name: Renesas Salvator-X board based on
> >>>>>> r8a7795 (DT)
> >>>>>> [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> >>>>>> [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> >>>>>> [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
> >>>>> 
> >>>>> Do you have a test case to trigger this?
> >>>> 
> >>>> Yes I have a testcase, it's rather complex and involves both a kernel
> >>>> module and a userspaces application to stress the rcar-dmac. I'm
> >>>> checking if I can share this publicly or not, please hold :-)
> >>> 
> >>> I have now received feedback that I'm unfortunately not allowed to
> >>> share the test case :-(
> >>> 
> >>> The big picture in how to trigger this problem is that you start a DMA
> >>> transfer like this:
> >>> 
> >>> struct dma_async_tx_descriptor *tx = ...;
> >>> 
> >>> ...
> >>> 
> >>> tx->tx_submit(tx);
> >>> 
> >>> And then you directly call dma_release_channel() on this channel
> >>> without making sure the completion callback ran or anything. Now if you
> >>> are unlucky the ISR have not finished running for the DMA when
> >>> dma_release_channel() starts to clean up resources. The synchronisation
> >>> point in the dma_release_channel() call path fixes this.
> >> 
> >> Well the API expectation would be you abort the txn before calling
> >> release. So the expected order should be:
> >> 
> >> dmaengine_terminate_all();
> >> dma_release_channel();
> > 
> > Agree this is the correct way and in this case patch 3/3 in this series
> > could be dropped. Then device_synchronize() would added to rcar-dmac,
> > dmaengine_terminate_all() would turn of the IRQ and
> > dma_release_channel() would ensure that device_synchronize() is called
> > prior to calling rcar-dmac device_free_chan_resources().
> > 
> >> Terminate should then stop the channel, ie abort the pending
> >> descriptors..
> > 
> > However for reasons unknown to me the rcar-dmac
> > device_free_chan_resources() implementation implements logic to turn of
> > IRQs before it frees the resources. And it's because of this patch 3/3
> > is needed so that it can be sure no ISR is running before it frees
> > resources.
> > 
> > I don't know how to best proceed here. I agree it feels a bit odd that
> > device_free_chan_resources() is dealing with the IRQs as such things
> > should be done before it's called. But on the other hand that code has
> > been part of the driver since it was added upstream. I feel a bit
> > uncomfortable just removing that part from the
> > device_free_chan_resources() since the driver have been in use with it
> > for such a long time.
> > 
> > How would you prefer I try and resolve this?
> 
> Perhaps Laurent knows why it was implemented this way?

That was nearly 3 years ago, and I can hardly remember reasons related to code 
I wrote 3 months ago :-)

I might just have been overcautious, guarding against conditions that should 
not happen if the caller behaves correctly. The situation might have changed 
since the driver was written. It might also be just a case of cargo-cult 
programming, as the shdma_free_chan_resources() has very similar code.

Given that freeing channel resources when the channel isn't idle can cause an 
oops, I think we should guard against that. This should probably be 
implemented in the dma-engine core, to make sure we catch the issue in as many 
drivers as possible.

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux PCI]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux