Re: [PATCH 3/3] dmaengine: rcar-dmac: wait for ISR to finish before freeing resources

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 12, 2017 at 02:49:38PM +0200, Niklas Söderlund wrote:
> On 2017-04-07 14:33:47 +0300, Laurent Pinchart wrote:
> > Hi Geert,
> > 
> > On Wednesday 05 Apr 2017 12:40:11 Geert Uytterhoeven wrote:
> > > On Wed, Apr 5, 2017 at 11:14 AM, Niklas Söderlund wrote:
> > > > On 2017-04-05 08:55:31 +0530, Vinod Koul wrote:
> > > >> On Thu, Mar 30, 2017 at 09:38:39AM +0200, Niklas Söderlund wrote:
> > > >>> On 2017-03-29 15:30:42 +0200, Niklas Söderlund wrote:
> > > >>>> On 2017-03-29 14:31:33 +0200, Geert Uytterhoeven wrote:
> > > >>>>> On Wed, Mar 29, 2017 at 12:40 AM, Niklas Söderlund wrote:
> > > >>>>>> This fixes a race condition where the channel resources could be
> > > >>>>>> freed before the ISR had finished running resulting in a NULL
> > > >>>>>> pointer reference from the ISR.
> > > >>>>>> 
> > > >>>>>> [  167.148934] Unable to handle kernel NULL pointer dereference
> > > >>>>>> at virtual address 00000000
> > > >>>>>> [  167.157051] pgd = ffff80003c641000
> > > >>>>>> [  167.160449] [00000000] *pgd=000000007c507003,
> > > >>>>>> *pud=000000007c4ff003, *pmd=0000000000000000
> > > >>>>>> [  167.168719] Internal error: Oops: 96000046 [#1] PREEMPT SMP
> > > >>>>>> [  167.174289] Modules linked in:
> > > >>>>>> [  167.177348] CPU: 3 PID: 10547 Comm: dma_ioctl Not tainted
> > > >>>>>> 4.11.0-rc1-00001-g8d92afddc2f6633a #73
> > > >>>>>> [  167.186131] Hardware name: Renesas Salvator-X board based on
> > > >>>>>> r8a7795 (DT)
> > > >>>>>> [  167.192917] task: ffff80003a411a00 task.stack: ffff80003bcd4000
> > > >>>>>> [  167.198850] PC is at rcar_dmac_chan_prep_sg+0xe0/0x400
> > > >>>>>> [  167.203985] LR is at rcar_dmac_chan_prep_sg+0x48/0x400
> > > >>>>> 
> > > >>>>> Do you have a test case to trigger this?
> > > >>>> 
> > > >>>> Yes I have a testcase, it's rather complex and involves both a kernel
> > > >>>> module and a userspaces application to stress the rcar-dmac. I'm
> > > >>>> checking if I can share this publicly or not, please hold :-)
> > > >>> 
> > > >>> I have now received feedback that I'm unfortunately not allowed to
> > > >>> share the test case :-(
> > > >>> 
> > > >>> The big picture in how to trigger this problem is that you start a DMA
> > > >>> transfer like this:
> > > >>> 
> > > >>> struct dma_async_tx_descriptor *tx = ...;
> > > >>> 
> > > >>> ...
> > > >>> 
> > > >>> tx->tx_submit(tx);
> > > >>> 
> > > >>> And then you directly call dma_release_channel() on this channel
> > > >>> without making sure the completion callback ran or anything. Now if you
> > > >>> are unlucky the ISR have not finished running for the DMA when
> > > >>> dma_release_channel() starts to clean up resources. The synchronisation
> > > >>> point in the dma_release_channel() call path fixes this.
> > > >> 
> > > >> Well the API expectation would be you abort the txn before calling
> > > >> release. So the expected order should be:
> > > >> 
> > > >> dmaengine_terminate_all();
> > > >> dma_release_channel();
> > > > 
> > > > Agree this is the correct way and in this case patch 3/3 in this series
> > > > could be dropped. Then device_synchronize() would added to rcar-dmac,
> > > > dmaengine_terminate_all() would turn of the IRQ and
> > > > dma_release_channel() would ensure that device_synchronize() is called
> > > > prior to calling rcar-dmac device_free_chan_resources().
> > > > 
> > > >> Terminate should then stop the channel, ie abort the pending
> > > >> descriptors..
> > > > 
> > > > However for reasons unknown to me the rcar-dmac
> > > > device_free_chan_resources() implementation implements logic to turn of
> > > > IRQs before it frees the resources. And it's because of this patch 3/3
> > > > is needed so that it can be sure no ISR is running before it frees
> > > > resources.
> > > > 
> > > > I don't know how to best proceed here. I agree it feels a bit odd that
> > > > device_free_chan_resources() is dealing with the IRQs as such things
> > > > should be done before it's called. But on the other hand that code has
> > > > been part of the driver since it was added upstream. I feel a bit
> > > > uncomfortable just removing that part from the
> > > > device_free_chan_resources() since the driver have been in use with it
> > > > for such a long time.
> > > > 
> > > > How would you prefer I try and resolve this?
> > > 
> > > Perhaps Laurent knows why it was implemented this way?
> > 
> > That was nearly 3 years ago, and I can hardly remember reasons related to code 
> > I wrote 3 months ago :-)
> > 
> > I might just have been overcautious, guarding against conditions that should 
> > not happen if the caller behaves correctly. The situation might have changed 
> > since the driver was written. It might also be just a case of cargo-cult 
> > programming, as the shdma_free_chan_resources() has very similar code.
> 
> Since the driver today have this behavior would it not be best to first 
> make sure it functions as expected and then as a second step see if we 
> can remove it all together?
> 
> Vinod would you be strongly opposed to picking up this series as is?

If there are no objections then I don't mind picking, please do rebase on
-rc1 and resend

-- 
~Vinod



[Index of Archives]     [Linux Samsung SOC]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux