Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2017-04-07 14:33:47 +0300, Laurent Pinchart wrote:
> Hi Geert,
> 
> On Wednesday 05 Apr 2017 12:40:11 Geert Uytterhoeven wrote:
> > On Wed, Apr 5, 2017 at 11:14 AM, Niklas Söderlund wrote:
> > > On 2017-04-05 08:55:31 +0530, Vinod Koul wrote:
> > >> On Thu, Mar 30, 2017 at 09:38:39AM +0200, Niklas Söderlund wrote:
> > >>> On 2017-03-29 15:30:42 +0200, Niklas Söderlund wrote:
> > >>>> On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
> > >>>>> On Wed, Mar 29, 2017 at 12:40 AM, Niklas Söderlund wrote:
> > >>>>>> This fixes a race condition where the channel resources could be
> > >>>>>> freed before the ISR had finished running resulting in a NULL
> > >>>>>> pointer reference from the ISR.
> > >>>>>> 
> > >>>>>> [  167.148934] Unable to handle kernel NULL pointer dereference
> > >>>>>> at virtual address 00000000
> > >>>>>> [  167.157051] pgd = ffff80003c641000
> > >>>>>> [  167.160449] [00000000] *pgd=000000007c507003,
> > >>>>>> *pud=000000007c4ff003, *pmd=0000000000000000
> > >>>>>> [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> > >>>>>> [  167.174289] Modules linked in:
> > >>>>>> [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted
> > >>>>>> 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> > >>>>>> [  167.186131] Hardware name: Renesas Salvator-X board based on
> > >>>>>> r8a7795 (DT)
> > >>>>>> [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> > >>>>>> [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> > >>>>>> [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
> > >>>>> 
> > >>>>> Do you have a test case to trigger this?
> > >>>> 
> > >>>> Yes I have a testcase, it's rather complex and involves both a kernel
> > >>>> module and a userspaces application to stress the rcar-dmac. I'm
> > >>>> checking if I can share this publicly or not, please hold :-)
> > >>> 
> > >>> I have now received feedback that I'm unfortunately not allowed to
> > >>> share the test case :-(
> > >>> 
> > >>> The big picture in how to trigger this problem is that you start a DMA
> > >>> transfer like this:
> > >>> 
> > >>> struct dma_async_tx_descriptor *tx = ...;
> > >>> 
> > >>> ...
> > >>> 
> > >>> tx->tx_submit(tx);
> > >>> 
> > >>> And then you directly call dma_release_channel() on this channel
> > >>> without making sure the completion callback ran or anything. Now if you
> > >>> are unlucky the ISR have not finished running for the DMA when
> > >>> dma_release_channel() starts to clean up resources. The synchronisation
> > >>> point in the dma_release_channel() call path fixes this.
> > >> 
> > >> Well the API expectation would be you abort the txn before calling
> > >> release. So the expected order should be:
> > >> 
> > >> dmaengine_terminate_all();
> > >> dma_release_channel();
> > > 
> > > Agree this is the correct way and in this case patch 3/3 in this series
> > > could be dropped. Then device_synchronize() would added to rcar-dmac,
> > > dmaengine_terminate_all() would turn of the IRQ and
> > > dma_release_channel() would ensure that device_synchronize() is called
> > > prior to calling rcar-dmac device_free_chan_resources().
> > > 
> > >> Terminate should then stop the channel, ie abort the pending
> > >> descriptors..
> > > 
> > > However for reasons unknown to me the rcar-dmac
> > > device_free_chan_resources() implementation implements logic to turn of
> > > IRQs before it frees the resources. And it's because of this patch 3/3
> > > is needed so that it can be sure no ISR is running before it frees
> > > resources.
> > > 
> > > I don't know how to best proceed here. I agree it feels a bit odd that
> > > device_free_chan_resources() is dealing with the IRQs as such things
> > > should be done before it's called. But on the other hand that code has
> > > been part of the driver since it was added upstream. I feel a bit
> > > uncomfortable just removing that part from the
> > > device_free_chan_resources() since the driver have been in use with it
> > > for such a long time.
> > > 
> > > How would you prefer I try and resolve this?
> > 
> > Perhaps Laurent knows why it was implemented this way?
> 
> That was nearly 3 years ago, and I can hardly remember reasons related to code 
> I wrote 3 months ago :-)
> 
> I might just have been overcautious, guarding against conditions that should 
> not happen if the caller behaves correctly. The situation might have changed 
> since the driver was written. It might also be just a case of cargo-cult 
> programming, as the shdma_free_chan_resources() has very similar code.

Since the driver today have this behavior would it not be best to first 
make sure it functions as expected and then as a second step see if we 
can remove it all together?

Vinod would you be strongly opposed to picking up this series as is?

> 
> Given that freeing channel resources when the channel isn't idle can cause an 
> oops, I think we should guard against that. This should probably be 
> implemented in the dma-engine core, to make sure we catch the issue in as many 
> drivers as possible.
> 
> -- 
> Regards,
> 
> Laurent Pinchart
> 

-- 
Regards,
Niklas Söderlund



[Index of Archives]     [Linux Samsung SOC]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux