Re: [net 05/27] can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context

Jakub Kicinski <kuba@xxxxxxxxxx> · Wed, 4 Nov 2020 08:02:37 -0800

On Wed, 4 Nov 2020 15:59:25 +0100 Oliver Hartkopp wrote:
> Hello Eric,
> 
> On 04.11.20 09:16, Eric Dumazet wrote:
> > On Wed, Nov 4, 2020 at 2:21 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:  
> >>
> >> On Tue,  3 Nov 2020 23:06:14 +0100 Marc Kleine-Budde wrote:  
> >>> From: Vincent Mailhol <mailhol.vincent@xxxxxxxxxx>
> >>>
> >>> If a driver calls can_get_echo_skb() during a hardware IRQ (which is often, but
> >>> not always, the case), the 'WARN_ON(in_irq)' in
> >>> net/core/skbuff.c#skb_release_head_state() might be triggered, under network
> >>> congestion circumstances, together with the potential risk of a NULL pointer
> >>> dereference.
> >>>
> >>> The root cause of this issue is the call to kfree_skb() instead of
> >>> dev_kfree_skb_irq() in net/core/dev.c#enqueue_to_backlog().
> >>>
> >>> This patch prevents the skb to be freed within the call to netif_rx() by
> >>> incrementing its reference count with skb_get(). The skb is finally freed by
> >>> one of the in-irq-context safe functions: dev_consume_skb_any() or
> >>> dev_kfree_skb_any(). The "any" version is used because some drivers might call
> >>> can_get_echo_skb() in a normal context.
> >>>
> >>> The reason for this issue to occur is that initially, in the core network
> >>> stack, loopback skb were not supposed to be received in hardware IRQ context.
> >>> The CAN stack is an exeption.
> >>>
> >>> This bug was previously reported back in 2017 in [1] but the proposed patch
> >>> never got accepted.
> >>>
> >>> While [1] directly modifies net/core/dev.c, we try to propose here a
> >>> smoother modification local to CAN network stack (the assumption
> >>> behind is that only CAN devices are affected by this issue).
> >>>
> >>> [1] http://lore.kernel.org/r/57a3ffb6-3309-3ad5-5a34-e93c3fe3614d@xxxxxxxxxxx
> >>>
> >>> Signed-off-by: Vincent Mailhol <mailhol.vincent@xxxxxxxxxx>
> >>> Link: https://lore.kernel.org/r/20201002154219.4887-2-mailhol.vincent@xxxxxxxxxx
> >>> Fixes: 39549eef3587 ("can: CAN Network device driver and Netlink interface")
> >>> Signed-off-by: Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx>  
> >>
> >> Hm... Why do we receive a skb with a socket attached?
> >>
> >> At a quick glance this is some loopback, so shouldn't we skb_orphan()
> >> in the xmit function instead?  
> > 
> > Yes this would work, this seems the safest way, loopback_xmit() is a
> > good template for this.
> >   
> >>
> >> Otherwise we should probably fix this in enqueue_to_backlog().  
> > 
> > This is dangerous.
> > 
> > If we drop packets under flood because the per-cpu backlog is full,
> > we might also be in _big_ trouble if the per-cpu
> > softnet_data.completion_queue is filling,
> > since we do not have a limit on this list.
> > 
> > What could happen is that when the memory is finally exhausted and no
> > more skb can be fed
> > to netif_rx(), a big latency spike would happen when
> > softnet_data.completion_queue
> > can finally be purged in one shot.

Thanks, that makes sense. So no to the enqueue_to_backlog() idea.

> > So skb_orphan(skb) in CAN before calling netif_rx() is better IMO.
> >   
> 
> Unfortunately you missed the answer from Vincent, why skb_orphan() does 
> not work here:
> 
> https://lore.kernel.org/linux-can/CAMZ6RqJyZTcqZcq6jEzm5LLM_MMe=dYDbwvv=Y+dBR0drWuFmw@xxxxxxxxxxxxxx/

Okay, we can take this as a quick fix but to me it seems a little
strange to be dropping what is effectively locally generated frames.

Can we use a NAPI poll model here and back pressure TX if the echo
is not keeping up?