> -----Original Message----- > From: Joakim Zhang <qiangqing.zhang@xxxxxxx> > Sent: 2021年3月1日 18:57 > To: Oleksij Rempel <o.rempel@xxxxxxxxxxxxxx>; mkl@xxxxxxxxxxxxxx; David S. > Miller <davem@xxxxxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Oliver > Hartkopp <socketcan@xxxxxxxxxxxx>; Robin van der Gracht > <robin@xxxxxxxxxxx> > Cc: Andre Naujoks <nautsch2@xxxxxxxxx>; Eric Dumazet > <edumazet@xxxxxxxxxx>; kernel@xxxxxxxxxxxxxx; linux-can@xxxxxxxxxxxxxxx; > netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > Subject: RE: [PATCH net v4 1/1] can: can_skb_set_owner(): fix ref counting if > socket was closed before setting skb ownership > > > > -----Original Message----- > > From: Oleksij Rempel <o.rempel@xxxxxxxxxxxxxx> > > Sent: 2021年2月26日 17:25 > > To: mkl@xxxxxxxxxxxxxx; David S. Miller <davem@xxxxxxxxxxxxx>; Jakub > > Kicinski <kuba@xxxxxxxxxx>; Oliver Hartkopp <socketcan@xxxxxxxxxxxx>; > > Robin van der Gracht <robin@xxxxxxxxxxx> > > Cc: Oleksij Rempel <o.rempel@xxxxxxxxxxxxxx>; Andre Naujoks > > <nautsch2@xxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>; > > kernel@xxxxxxxxxxxxxx; linux-can@xxxxxxxxxxxxxxx; > > netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > > Subject: [PATCH net v4 1/1] can: can_skb_set_owner(): fix ref counting > > if socket was closed before setting skb ownership > > > > There are two ref count variables controlling the free()ing of a socket: > > - struct sock::sk_refcnt - which is changed by sock_hold()/sock_put() > > - struct sock::sk_wmem_alloc - which accounts the memory allocated by > > the skbs in the send path. > > > > In case there are still TX skbs on the fly and the socket() is closed, > > the struct sock::sk_refcnt reaches 0. In the TX-path the CAN stack > > clones an "echo" skb, calls sock_hold() on the original socket and > > references it. This produces the following back trace: > > > > | WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 > > | refcount_warn_saturate+0x114/0x134 > > | refcount_t: addition on 0; use-after-free. > > | Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) > > imx_vdoa(E) > > | CPU: 0 PID: 280 Comm: test_can.sh Tainted: G E > > 5.11.0-04577-gf8ff6603c617 #203 > > | Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > > | Backtrace: > > | [<80bafea4>] (dump_backtrace) from [<80bb0280>] > > | (show_stack+0x20/0x24) > > | r7:00000000 r6:600f0113 r5:00000000 r4:81441220 [<80bb0260>] > > | (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8) [<80bb589c>] > > | (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 > > | r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90 > > | [<8012b194>] (__warn) from [<80bb09c4>] > > | (warn_slowpath_fmt+0x88/0xc8) > > | r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 > > | r4:80f4a8c2 [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] > > | (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 > > | r6:82b44000 r5:834e5600 r4:83f4d540 [<80528e7c>] > > | (refcount_warn_saturate) from [<8079a4c8>] > > | (__refcount_add.constprop.0+0x4c/0x50) > > | [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] > > | (can_put_echo_skb+0xb0/0x13c) [<8079a4cc>] (can_put_echo_skb) from > > | [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 > > | r8:83f48610 > > | r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600 [<8079b8d4>] > > | (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) > > | r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 > > | r4:82ab1f00 [<80969034>] (netdev_start_xmit) from [<809725a4>] > > | (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 > > | r7:82ab1f00 > > | r6:82b44000 r5:00000000 r4:834e5600 [<80972408>] > > | (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) > > | r10:834e5600 > > | r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 > > | r4:83f27400 [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] > > | (__qdisc_run+0x4f0/0x534) > > > > To fix this problem, only set skb ownership to sockets which have > > still a ref count > 0. > > > > Cc: Oliver Hartkopp <socketcan@xxxxxxxxxxxx> > > Cc: Andre Naujoks <nautsch2@xxxxxxxxx> > > Suggested-by: Eric Dumazet <edumazet@xxxxxxxxxx> > > Fixes: 0ae89beb283a ("can: add destructor for self generated skbs") > > Signed-off-by: Oleksij Rempel <o.rempel@xxxxxxxxxxxxxx> > > I will give out a test result tomorrow when the board is available. 😊 I also met this issue in the past and this patch indeed fix it. Thanks Oleksij Rempe. Tested-by: Joakim Zhang <qiangqing.zhang@xxxxxxx> Best Regards, Joakim Zhang > Best Regards, > Joakim Zhang > > --- > > include/linux/can/skb.h | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h index > > 685f34cfba20..d82018cc0d0b 100644 > > --- a/include/linux/can/skb.h > > +++ b/include/linux/can/skb.h > > @@ -65,8 +65,12 @@ static inline void can_skb_reserve(struct sk_buff > > *skb) > > > > static inline void can_skb_set_owner(struct sk_buff *skb, struct sock *sk) > { > > - if (sk) { > > - sock_hold(sk); > > + /* > > + * If the socket has already been closed by user space, the refcount may > > + * already be 0 (and the socket will be freed after the last TX skb has > > + * been freed). So only increase socket refcount if the refcount is > 0. > > + */ > > + if (sk && refcount_inc_not_zero(&sk->sk_refcnt)) { > > skb->destructor = sock_efree; > > skb->sk = sk; > > } > > -- > > 2.29.2