Re: rcar-dmac.c: race condition regarding cookie handling?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Geert,

On 29.01.2024 10:57, Geert Uytterhoeven wrote:
Hi Dirk,

CC Kees (for the wrap-around in dma_cookie_assign() not handled in [A])

On Wed, Nov 22, 2023 at 8:02 AM Behme Dirk (CM/ESO2)
<dirk.behme@xxxxxxxxxxxx> wrote:
using a rcar-dmac.c on RCar3 being quite similar to the recent mainline
one [1] we got a BUG_ON() being hit [2].

Thanks for your report!

Many thanks for looking into this :)

The side channel info told me this is rcar-3.9.11 based on v4.14.75?

Yes correct.

This is

static inline void dma_cookie_complete(struct dma_async_tx_descriptor *tx)
{
         BUG_ON(tx->cookie < DMA_MIN_COOKIE);
          ...

Note that dma_cookie_complete() also resets tx->cookie = 0 at the end
of the function, which is also < 1.

from dmaengine.h. I think DMA_MIN_COOKIE is 1, so it seems that cookie
becomes < 1.

Looking at rcar-dmac.c, there is one place where cookie is set to a
negative value [3]

desc->async_tx.cookie = -EBUSY;

And it looks like this is not protected by a spin_lock (?).

Could it be that we hit a race condition here?

rcar_dmac_chan_prep_sg() indeed does:

     desc = rcar_dmac_desc_get(chan);
     if (!desc)
             return NULL;

     desc->async_tx.flags = dma_flags;
     desc->async_tx.cookie = -EBUSY;

However, rcar_dmac_desc_get() does hold the spinlock while removing the
descriptor from the free list (chan->desc.free).  So at the time its
cookie is set to -EBUSY, the descriptor is no longer part of any list.
In case of an error, the descriptor is returned to the free list.
In case of success, it is returned to the caller.

When the crash happens, rcar_dmac_isr_channel_thread() is processing
descriptors on the completed list (chan->desc.done).

How do descriptors end up on the completed list?
   - rcar_dmac_tx_submit() fills in a proper cookie (from
     dma_cookie_assign()), and adds the descriptor to the pending list
     (chan->desc.pending), while holding the spinlock,
   - rcar_dmac_issue_pending() moves descriptors on the pending list
     to the active list (chan->desc.active), and, if idle, starts
     the first one, all while holding the spinlock.
   - rcar_dmac_isr_transfer_end() moves the current descriptor (which
     is on the active list) to the completed list, while holding the spinlock
     (by its caller, rcar_dmac_isr_channel()).

None of this looks suspicious to me...

Do you know if any calls to dmaengine_terminate_all() were done before?

Was the system running for a very long time?


Hmm, the trace I have contains boot time stamps (dropped initially) like

[  153.394731] kernel BUG at drivers/dma/sh/../dmaengine.h:54!

I think this "153" implies 153s after boot, i.e. ~2.5s after system start. In case there is no wrap around here too.

Best regards

Dirk


dma_cookie_assign() relies on 2-complement signed wrap-around:

         cookie = chan->cookie + 1;
         if (cookie < DMA_MIN_COOKIE)
                 cookie = DMA_MIN_COOKIE;

but given the kernel is compiled with -fno-strict-overflow (which
implies -fwrapv) that should work.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/sh/rcar-dmac.c

[2]

kernel BUG at drivers/dma/dmaengine.h:54!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Hardware name: Custom board board based on r8a77990 (DT)
pc : rcar_dmac_isr_channel_thread+0xc0/0x190
lr : rcar_dmac_isr_channel_thread+0x84/0x190
sp : ffff000008b8bd30 pstate : a00001c5
x29: ffff000008b8bd30 x28: ffffe026daed4298
x27: dead000000000200 x26: dead000000000100
x25: ffff158988c28000 x24: ffffe026fb558cc8
x23: 0000000000000000 x22: ffffe026daed4254
x21: ffff1589821bd000 x20: ffffe026daed41a8
x19: ffffe026daed4288 x18: 0000fffff59a0c6a
x17: 0000ffff94c70f88 x16: ffff158981f3a17c
x15: 0000000000000000 x14: 0000000000000400
x13: 0000000000000400 x12: 0000000000000001
x11: ffffe026fab47000 x10: 0000000000000a10
x9 : ffff000008b8bd10 x8 : ffffe026fe0f1870
x7 : 0000000000010000 x6 : 00000000588b0000
x5 : ffffe026fe5d1410 x4 : 0000000000000000
x3 : 0000000000000002 x2 : 0000000000000000
x1 : 0000000000000000 x0 : 00000000e6f90060

Unfortunately this does not show the actual cookie value found.

Call trace:
   rcar_dmac_isr_channel_thread+0xc0/0x190
   irq_thread_fn+0x28/0x6c
   irq_thread+0x134/0x198
   kthread+0x120/0x130
   ret_from_fork+0x10/0x18
Code: f9407293 b85a8260 7100001f 5400004c (d4210000)
---[ end trace 03ab56fc988cadbc ]---
Kernel panic - not syncing: Fatal exception

[3]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/sh/rcar-dmac.c#n951

[A] https://lore.kernel.org/all/20240122235208.work.748-kees@xxxxxxxxxx/

Gr{oetje,eeting}s,

                         Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                 -- Linus Torvalds






[Index of Archives]     [Linux Samsung SOC]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux