Re: [BUG] net: stmmac: Panic observed in stmmac_napi_poll_rx()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/17/21 1:39 PM, Jon Hunter wrote:

On 14/05/2021 22:49, Michał Mirosław wrote:
On Fri, May 14, 2021 at 03:24:58PM +0100, Jon Hunter wrote:
Hello!

I have been looking into some random crashes that appear to stem from
the stmmac_napi_poll_rx() function. There are two different panics I
have observed which are ...
[...]
The bug being triggered in skbuff.h is the following ...

  void *skb_pull(struct sk_buff *skb, unsigned int len);
  static inline void *__skb_pull(struct sk_buff *skb, unsigned int len)
  {
          skb->len -= len;
          BUG_ON(skb->len < skb->data_len);
          return skb->data += len;
  }

Looking into the above panic triggered in skbuff.h, when this occurs
I have noticed that the value of skb->data_len is unusually large ...

  __skb_pull: len 1500 (14), data_len 4294967274
[...]

The big value looks suspiciously similar to (unsigned)-EINVAL.

Yes it does and at first, I thought it was being set to -EINVAL.
However, from tracing the length variables I can see that this is not
the case.

I then added some traces to stmmac_napi_poll_rx() and
stmmac_rx_buf2_len() to trace the values of various various variables
and when the problem occurs I see ...

  stmmac_napi_poll_rx: stmmac_rx: count 0, len 1518, buf1 66, buf2 1452
  stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 66, plen 1518
  stmmac_napi_poll_rx: stmmac_rx: count 1, len 1518, buf1 66, buf2 1452
  stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 66, plen 1536
  stmmac_napi_poll_rx: stmmac_rx: count 2, len 1602, buf1 66, buf2 1536
  stmmac_napi_poll_rx: stmmac_rx_buf2_len: len 1602, plen 1518
  stmmac_napi_poll_rx: stmmac_rx: count 2, len 1518, buf1 0, buf2 4294967212
  stmmac_napi_poll_rx: stmmac_rx: dma_buf_sz 1536, buf1 0, buf2 4294967212

And this one to (unsigned)-EILSEQ.

Yes but this simply comes from 1518-1602 = -84. So it is purely
coincidence.

Jon


I dug around this a little bit. It looks like the issue occurs when we get (pardon my terminology, I haven't dealt with networking stuff much) a split packet.

What happens is we first process the first frame, growing 'len'. buf1_len, I think, hits the "First descriptor, get split header length" case and the length is 66. buf2_len hits the rx_not_ls case and the length is 1536. In total 1602.

Then the condition 'likely(status & rx_not_ls)' passes and we goto back to 'read_again', and read the next frame. Here we eventually get to buf2_len again. stmmac_get_rx_frame_len returns 1518 for this frame which sounds reasonable, that's what we normally get for non-split frames. So what we get is 1518 - 1602 which overflows.

I can dig around a bit more but it would be nice if someone with a bit more knowledge of the hardware could comment on the above.

Thanks,
Mikko



[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux