Re: rcar-dmac.c: race condition regarding cookie handling?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dirk,

CC Kees (for the wrap-around in dma_cookie_assign() not handled in [A])

On Wed, Nov 22, 2023 at 8:02 AM Behme Dirk (CM/ESO2)
<dirk.behme@xxxxxxxxxxxx> wrote:
> using a rcar-dmac.c on RCar3 being quite similar to the recent mainline
> one [1] we got a BUG_ON() being hit [2].

Thanks for your report!
The side channel info told me this is rcar-3.9.11 based on v4.14.75?

> This is
>
> static inline void dma_cookie_complete(struct dma_async_tx_descriptor *tx)
> {
>         BUG_ON(tx->cookie < DMA_MIN_COOKIE);
>          ...

Note that dma_cookie_complete() also resets tx->cookie = 0 at the end
of the function, which is also < 1.

> from dmaengine.h. I think DMA_MIN_COOKIE is 1, so it seems that cookie
> becomes < 1.
>
> Looking at rcar-dmac.c, there is one place where cookie is set to a
> negative value [3]
>
> desc->async_tx.cookie = -EBUSY;
>
> And it looks like this is not protected by a spin_lock (?).
>
> Could it be that we hit a race condition here?

rcar_dmac_chan_prep_sg() indeed does:

    desc = rcar_dmac_desc_get(chan);
    if (!desc)
            return NULL;

    desc->async_tx.flags = dma_flags;
    desc->async_tx.cookie = -EBUSY;

However, rcar_dmac_desc_get() does hold the spinlock while removing the
descriptor from the free list (chan->desc.free).  So at the time its
cookie is set to -EBUSY, the descriptor is no longer part of any list.
In case of an error, the descriptor is returned to the free list.
In case of success, it is returned to the caller.

When the crash happens, rcar_dmac_isr_channel_thread() is processing
descriptors on the completed list (chan->desc.done).

How do descriptors end up on the completed list?
  - rcar_dmac_tx_submit() fills in a proper cookie (from
    dma_cookie_assign()), and adds the descriptor to the pending list
    (chan->desc.pending), while holding the spinlock,
  - rcar_dmac_issue_pending() moves descriptors on the pending list
    to the active list (chan->desc.active), and, if idle, starts
    the first one, all while holding the spinlock.
  - rcar_dmac_isr_transfer_end() moves the current descriptor (which
    is on the active list) to the completed list, while holding the spinlock
    (by its caller, rcar_dmac_isr_channel()).

None of this looks suspicious to me...

Do you know if any calls to dmaengine_terminate_all() were done before?

Was the system running for a very long time?
dma_cookie_assign() relies on 2-complement signed wrap-around:

        cookie = chan->cookie + 1;
        if (cookie < DMA_MIN_COOKIE)
                cookie = DMA_MIN_COOKIE;

but given the kernel is compiled with -fno-strict-overflow (which
implies -fwrapv) that should work.

> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/sh/rcar-dmac.c
>
> [2]
>
> kernel BUG at drivers/dma/dmaengine.h:54!
> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> Hardware name: Custom board board based on r8a77990 (DT)
> pc : rcar_dmac_isr_channel_thread+0xc0/0x190
> lr : rcar_dmac_isr_channel_thread+0x84/0x190
> sp : ffff000008b8bd30 pstate : a00001c5
> x29: ffff000008b8bd30 x28: ffffe026daed4298
> x27: dead000000000200 x26: dead000000000100
> x25: ffff158988c28000 x24: ffffe026fb558cc8
> x23: 0000000000000000 x22: ffffe026daed4254
> x21: ffff1589821bd000 x20: ffffe026daed41a8
> x19: ffffe026daed4288 x18: 0000fffff59a0c6a
> x17: 0000ffff94c70f88 x16: ffff158981f3a17c
> x15: 0000000000000000 x14: 0000000000000400
> x13: 0000000000000400 x12: 0000000000000001
> x11: ffffe026fab47000 x10: 0000000000000a10
> x9 : ffff000008b8bd10 x8 : ffffe026fe0f1870
> x7 : 0000000000010000 x6 : 00000000588b0000
> x5 : ffffe026fe5d1410 x4 : 0000000000000000
> x3 : 0000000000000002 x2 : 0000000000000000
> x1 : 0000000000000000 x0 : 00000000e6f90060

Unfortunately this does not show the actual cookie value found.

> Call trace:
>   rcar_dmac_isr_channel_thread+0xc0/0x190
>   irq_thread_fn+0x28/0x6c
>   irq_thread+0x134/0x198
>   kthread+0x120/0x130
>   ret_from_fork+0x10/0x18
> Code: f9407293 b85a8260 7100001f 5400004c (d4210000)
> ---[ end trace 03ab56fc988cadbc ]---
> Kernel panic - not syncing: Fatal exception
>
> [3]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/sh/rcar-dmac.c#n951

[A] https://lore.kernel.org/all/20240122235208.work.748-kees@xxxxxxxxxx/

Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds





[Index of Archives]     [Linux Samsung SOC]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux