RE: [5.10 CAN BUG report] kernel dump about echo skb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



@Marc Kleine-Budde, @Oleksij Rempel, any comments? Thanks a lot.

Best Regards,
Joakim Zhang

> -----Original Message-----
> From: Joakim Zhang <qiangqing.zhang@xxxxxxx>
> Sent: 2021年1月21日 16:38
> To: Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx>; Oleksij Rempel
> <o.rempel@xxxxxxxxxxxxxx>; Oliver Hartkopp <socketcan@xxxxxxxxxxxx>
> Cc: kernel@xxxxxxxxxxxxxx; linux-can@xxxxxxxxxxxxxxx
> Subject: [5.10 CAN BUG report] kernel dump about echo skb
> 
> 
> Hi Marc, Oleksij, Oliver,
> 
> Our auto test scripts reports below kernel dump, after revert Oleksij Rempel
> patch "286228d382ba can: can_create_echo_skb(): fix echo skb generation:
> always use skb_clone()", this issue disappeared.
> What strange is that, commit message intends to fix this issue. Report it here
> to see if anyone meets the same issue or can give some suggestions. I will
> continue to dig how to reproduce it manually, now I don't know how to do it.
> Thanks.
> 
> 1. Kernel dump log:
> [  101.688327] ------------[ cut here ]------------ [  101.692968] refcount_t:
> addition on 0; use-after-free.
> [  101.698210] WARNING: CPU: 0 PID: 0 at lib/refcount.c:25
> refcount_warn_saturat
> e+0x98/0x140
> [  101.706386] Modules linked in:
> [  101.709446] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 5.10.4-100715-gc60abf9c 2110-dirty #376 [  101.717881] Hardware name:
> NXP i.MX8MPlus EVK board (DT) [  101.723193] pstate: 40000005 (nZcv daif
> -PAN -UAO -TCO BTYPE=--) [  101.729200] pc :
> refcount_warn_saturate+0x98/0x140 [  101.733990] lr :
> refcount_warn_saturate+0x98/0x140 [  101.738778] sp : ffff800011bcbcd0
> [  101.742090] x29: ffff800011bcbcd0 x28: ffff0000c16b7000 [  101.747405]
> x27: 0000000000000000 x26: ffff0000c1718000 [  101.752718] x25:
> 00000000048c0000 x24: ffff0000c1e54c00 [  101.758031] x23:
> 0000000000000000 x22: ffff0000c1718900 [  101.763345] x21:
> ffff0000c1718000 x20: ffff0000c1f00800 [  101.768661] x19: ffff0000c1985e00
> x18: 0000000000000010 [  101.773974] x17: 0000000000000000 x16:
> 0000000000000000 [  101.779287] x15: 0000000000000229 x14:
> ffff8000119d2cb8 [  101.784600] x13: 00000000ffffffea x12: ffff800011a51b60
> [  101.789914] x11: 0000000000000003 x10: ffff800011a39b78 [  101.795227]
> x9 : 0000000000017fe8 x8 : ffff800011a39b20 [  101.800543] x7 :
> c0000000ffffefff x6 : 0000000000000001 [  101.805858] x5 :
> 0000000000000000 x4 : ffff00017f36f900 [  101.811174] x3 : ffff00017f376830
> x2 : ffff00017f36f900 [  101.816487] x1 : 5ead5e088841e800 x0 :
> 0000000000000000 [  101.821800] Call trace:
> [  101.824248]  refcount_warn_saturate+0x98/0x140 [  101.828694]
> can_put_echo_skb+0x154/0x190 [  101.832704]
> flexcan_start_xmit+0x188/0x240 [  101.836888]
> dev_hard_start_xmit+0xe0/0x1a8 [  101.841072]
> sch_direct_xmit+0x11c/0x338 [  101.844995]  __qdisc_run+0x118/0x5d8
> [  101.848570]  net_tx_action+0x110/0x1a8 [  101.852319]
> efi_header_end+0x120/0x264 [  101.856156]  irq_exit+0xb8/0xd8
> [  101.859298]  __handle_domain_irq+0x64/0xb8 [  101.863396]
> gic_handle_irq+0x54/0x118 [  101.867145]  el1_irq+0xc0/0x180
> [  101.870288]  cpuidle_enter_state+0x130/0x2e8 [  101.874560]
> cpuidle_enter+0x34/0x48 [  101.878136]  call_cpuidle+0x18/0x38
> [  101.881627]  do_idle+0x1f4/0x258 [  101.884854]
> cpu_startup_entry+0x20/0x68 [  101.888778]  rest_init+0xd4/0xe0
> [  101.892008]  arch_call_rest_init+0xc/0x14 [  101.896019]
> start_kernel+0x470/0x4a4 [  101.899681] ---[ end trace
> 8de9661669c150a6 ]--- [  101.908725] ------------[ cut here ]------------
> [  101.913379] refcount_t: underflow; use-after-free.
> [  101.918360] WARNING: CPU: 0 PID: 9 at lib/refcount.c:28
> refcount_warn_saturat
> e+0xec/0x140
> [  101.926544] Modules linked in:
> [  101.929605] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G        W
> 5.10.
> 4-100715-gc60abf9c2110-dirty #376
> [  101.939603] Hardware name: NXP i.MX8MPlus EVK board (DT)
> [  101.944914] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> [  101.950921] pc : refcount_warn_saturate+0xec/0x140 [  101.955709] lr :
> refcount_warn_saturate+0xec/0x140 [  101.960497] sp : ffff800011c73b00
> [  101.963809] x29: ffff800011c73b00 x28: 000000000000003d [  101.969123]
> x27: ffff00017f37a880 x26: ffff800011c73d48 [  101.974435] x25:
> 000000000000012c x24: ffff800011c73d58 [  101.979748] x23:
> ffff0000c1718000 x22: 0000000000000000 [  101.985061] x21:
> ffff800011affcc0 x20: ffff0000c13eeb80 [  101.990374] x19: ffff0000c1f00800
> x18: 0000000000000010 [  101.995686] x17: 0000000000000000 x16:
> 0000000000000000 [  102.000999] x15: 0000000000000258 x14:
> ffff0000c007f278 [  102.006311] x13: 00000000ffffffea x12: ffff800011a51b60
> [  102.011624] x11: 0000000000000003 x10: ffff800011a39b78 [  102.016936]
> x9 : 0000000000017fe8 x8 : ffff800011a39b20 [  102.022249] x7 :
> c0000000ffffefff x6 : 0000000000000001 [  102.027562] x5 :
> 0000000000000000 x4 : ffff00017f36f900 [  102.032874] x3 : ffff00017f376830
> x2 : ffff00017f36f900 [  102.038187] x1 : b6fa97510f7af300 x0 :
> 0000000000000000 [  102.043500] Call trace:
> [  102.045947]  refcount_warn_saturate+0xec/0x140 [  102.050391]
> sock_efree+0x6c/0x78 [  102.053707]  skb_release_head_state+0x40/0xb0
> [  102.058065]  skb_release_all+0x14/0x30 [  102.061813]
> kfree_skb+0x2c/0x88 [  102.065041]  can_receive+0xb4/0xf8 [  102.068442]
> can_rcv+0x3c/0xc0 [  102.071496]
> __netif_receive_skb_one_core+0x50/0x78
> [  102.076373]  __netif_receive_skb+0x14/0x60 [  102.080468]
> netif_receive_skb+0x34/0xc0 [  102.084392]
> can_rx_offload_napi_poll+0x60/0xd8
> [  102.088921]  net_rx_action+0x114/0x378 [  102.092670]
> efi_header_end+0x120/0x264 [  102.096506]  run_ksoftirqd+0x44/0x58
> [  102.100081]  smpboot_thread_fn+0x1b4/0x270 [  102.104177]
> kthread+0x138/0x158 [  102.107405]  ret_from_fork+0x10/0x18
> [  102.110979] ---[ end trace 8de9661669c150a7 ]---
> 
> 2. Patch introduces issue:
> commit 286228d382ba6320f04fa2e7c6fc8d4d92e428f4
> Author: Oleksij Rempel <o.rempel@xxxxxxxxxxxxxx>
> Date:   Wed Dec 18 09:39:02 2019 +0100
> 
>     can: can_create_echo_skb(): fix echo skb generation: always use
> skb_clone()
> 
>     All user space generated SKBs are owned by a socket (unless injected into
> the
>     key via AF_PACKET). If a socket is closed, all associated skbs will be
> cleaned
>     up.
> 
>     This leads to a problem when a CAN driver calls can_put_echo_skb() on a
>     unshared SKB. If the socket is closed prior to the TX complete handler,
>     can_get_echo_skb() and the subsequent delivering of the echo SKB to all
>     registered callbacks, a SKB with a refcount of 0 is delivered.
> 
>     To avoid the problem, in can_get_echo_skb() the original SKB is now
> always
>     cloned, regardless of shared SKB or not. If the process exists it can now
>     safely discard its SKBs, without disturbing the delivery of the echo SKB.
> 
>     The problem shows up in the j1939 stack, when it clones the incoming skb,
> which
>     detects the already 0 refcount.
> 
>     We can easily reproduce this with following example:
> 
>     testj1939 -B -r can0: &
>     cansend can0 1823ff40#0123
> 
>     WARNING: CPU: 0 PID: 293 at lib/refcount.c:25
> refcount_warn_saturate+0x108/0x174
>     refcount_t: addition on 0; use-after-free.
>     Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc
> dw_hdmi_ahb_audio vcan
>     CPU: 0 PID: 293 Comm: cansend Not tainted
> 5.5.0-rc6-00376-g9e20dcb7040d #1
>     Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>     Backtrace:
>     [<c010f570>] (dump_backtrace) from [<c010f90c>]
> (show_stack+0x20/0x24)
>     [<c010f8ec>] (show_stack) from [<c0c3e1a4>] (dump_stack+0x8c/0xa0)
>     [<c0c3e118>] (dump_stack) from [<c0127fec>] (__warn+0xe0/0x108)
>     [<c0127f0c>] (__warn) from [<c01283c8>]
> (warn_slowpath_fmt+0xa8/0xcc)
>     [<c0128324>] (warn_slowpath_fmt) from [<c0539c0c>]
> (refcount_warn_saturate+0x108/0x174)
>     [<c0539b04>] (refcount_warn_saturate) from [<c0ad2cac>]
> (j1939_can_recv+0x20c/0x210)
>     [<c0ad2aa0>] (j1939_can_recv) from [<c0ac9dc8>]
> (can_rcv_filter+0xb4/0x268)
>     [<c0ac9d14>] (can_rcv_filter) from [<c0aca2cc>] (can_receive+0xb0/0xe4)
>     [<c0aca21c>] (can_receive) from [<c0aca348>] (can_rcv+0x48/0x98)
>     [<c0aca300>] (can_rcv) from [<c09b1fdc>]
> (__netif_receive_skb_one_core+0x64/0x88)
>     [<c09b1f78>] (__netif_receive_skb_one_core) from [<c09b2070>]
> (__netif_receive_skb+0x38/0x94)
>     [<c09b2038>] (__netif_receive_skb) from [<c09b2130>]
> (netif_receive_skb_internal+0x64/0xf8)
>     [<c09b20cc>] (netif_receive_skb_internal) from [<c09b21f8>]
> (netif_receive_skb+0x34/0x19c)
>     [<c09b21c4>] (netif_receive_skb) from [<c0791278>]
> (can_rx_offload_napi_poll+0x58/0xb4)
> 
>     Fixes: 0ae89beb283a ("can: add destructor for self generated skbs")
>     Signed-off-by: Oleksij Rempel <o.rempel@xxxxxxxxxxxxxx>
>     Link:
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flore.kern
> el.org%2Fr%2F20200124132656.22156-1-o.rempel%40pengutronix.de&amp;d
> ata=04%7C01%7Cqiangqing.zhang%40nxp.com%7C2d128ac66b3f4e93db9a08
> d8bde80d18%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6374681
> 51639635283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj
> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=QQtX%2
> FCpYRmeynsy18qnLFUMyFgwUSgzbQeSqYGwD3UM%3D&amp;reserved=0
>     Acked-by: Oliver Hartkopp <socketcan@xxxxxxxxxxxx>
>     Signed-off-by: Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx>
> 
> Best Regards,
> Joakim Zhang





[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux