Re: Potential ICE driver bug when returning XDP_TX with multiple channels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 06, 2022 at 12:39:11PM +0100, Maciej Fijalkowski wrote:
> On Fri, Dec 02, 2022 at 04:01:24PM +0000, Robin Cowley wrote:
> > Hi there,
> 
> Hi thanks for another report :) again I think I have a fix for this, but
> this fix is on top of the XDP multi-buffer support that I am working on.
> 
> I can share the set with you in the current state out of this list so we
> would know if this is a way forward.
> 
> From our side I think that we didn't believe that there are any ZC XDP_TX
> users so apologies that we didn't test this properly...

Robin,

although I shared a set off the list with you that is supposed to finally
address your issues, we thought that it would be worth explaining publicly
what was going on, instead of saying "I think I have a fix for this".

Some time ago we added optimizations for both ZC Tx and XDP_TX sides.
Then, it was pointed out during upstream reviews that for ZC Tx this was
problematic, so we removed it [0].

Now, the case is that ZC Tx and XDP_TX that can happen when running XDP Rx
hook differ internally. So, when you combine the two, things can go south
as both ZC Tx and XDP_TX operate on the same XDP ring.

ZC Tx (ice_xmit_zc() in particular) as a first thing tries to clean XDP
ring. Tx logic precedes Rx in ice_napi_poll(). ZC Tx can affect
ice_clean_xdp_irq()'s due to the fact that it looks at particular spots in
the ring for done descriptors. If these were already cleaned, then this
function interprets this as there is nothing to be cleaned and in such case
driver bumps tx_busy stat and drops the frame. So you get a deadlock of
some sort due to two cleaning functions racing with each other.

So, what we need to do is to find a way how to fix this smoothly so that
we will be able to backport it. Next kernels are going to be fine, but
previous ones need some massaging.

Aaaaand we need to add test case to xskxceiver for XDP_TX :)

Thanks,
Maciej

[0]: https://lore.kernel.org/bpf/20220927164112.4011983-2-anthony.l.nguyen@xxxxxxxxx/

> 
> > 
> > We have been testing an XDP program with Intel E810 / ICE driver. The program uses AF_XDP in Zero Copy mode. In some conditions, the program logic decides to return XDP_TX action rather than XDP_REDIRECT, so packets are transmitted back out the interface they arrived on. (In this condition we also swap the MAC addresses in Ethernet header within the BPF program).
> > 
> > We recently identified and submitted a bug report onto XDP-newbies, as we were seeing memory type errors. [1] The patch supplied by Maciej fixed our initial problem of an incorrect XDP memory type. However, upon further testing on Kernel 6.1-rc7 with Maciej’s patch applied we have noticed some further strange behaviour with larger packet rates and more channels.
> > 
> > It seems that when we use multiple channels with dedicated F/C queues, some of the channels stall and do not transmit any of the packets back out of the interface.
> > 
> > We have been running a slightly modified xdpsock based program as a test harness on the test_zero_copy_tx branch [2] as per the previous reports. (Again, we’ve tested with the version of the program that enables multiple F/C queues per XSK).
> > 
> > This XDP program now has some percpu stats added into it, so we can have visibility on what actions we are returning from the XDP prog. We observe that while the BPF program does receive every packet and return XDP_TX properly, some of the channels experience a TX failure, leading to packet loss.
> > 
> > This led us to do a little bit of digging and we noticed that at a certain point, when a channel stopped transmitting packets back out the NIC, the tx_busy counter is being incremented. We looked into the ICE driver code and we believe this is originating from ice_xmit_xdp_ring() at ice_txrx_lib.c:288.
> > 
> > We progressed to enable tracing on XDP events; we can see an xdp_exception trace for every failed transmit operation. We added two additional tracepoints in the XDP_TX code path within the ice_run_xdp_zc() function of ice_xsk.c file – one on successful TX, one on failed TX – with both highlighting xdp_ring->q_index and rx_ring->q_index values. Our rationale was to try and identify if there was a pattern to which channels were failing.
> > 
> > We were able to establish with reasonable degree of certainty:
> > 
> > A) Tracepoints following XDP_TX failure always had a TX/XDP ring q_index of 15 or less.
> > B) Tracepoints following XDP_TX success always had a TX/XDP ring q_index of 16 or above.
> > C) Both points above are true irrespective of the number of channels we had configured (1, 2, 4, 8)
> > D) Tx/Rx ring sizes did not seem to alter behaviour – we tested with tx/rx rings of 1024, 2048, 4096.
> > 
> > We think this might be a bug within the ICE driver, but we’ve only picked it up following resolution of the previous issue [1].
> > 
> > Our test rig happens to be a 16 core (dual socket) so we think that might have some bearing on the above. Due to the test machine being NUMA we have avoided testing with channel quantities above 8, but we’d be happy to if it would add value.
> > 
> > We were testing this on the kernel version 6.1.0-rc7+ with the patch supplied previously by Maciej.
> > 
> > Everything above was seen using Intel E810’s with the parameters [3] below.
> > 
> > Any guidance would be appreciated.
> > 
> > Thanks
> > Robin
> > 
> > [1] https://marc.info/?l=xdp-newbies&m=166973074709850&w=2
> > 
> > [2] https://github.com/OpenSource-THG/xdpsock-sample/tree/test_zero_copy_tx
> > 
> > [3] Ethtool Parameters:
> > 
> > 
> > # ethtool -i ice0
> > driver: ice
> > version: 6.1.0-rc7+
> > firmware-version: 2.50 0x800077a8 1.2960.0
> > expansion-rom-version:
> > bus-info: 0000:03:00.0
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: yes
> > 
> > 
> > # lspci -s 03:00.0
> > 03:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
> > 
> > # ethtool -g ice0
> > Ring parameters for ice0:
> > Pre-set maximums:
> > RX: 8160
> > RX Mini: n/a
> > RX Jumbo: n/a
> > TX: 8160
> > Current hardware settings:
> > RX: 4096
> > RX Mini: n/a
> > RX Jumbo: n/a
> > TX: 4096
> > 
> > # ethtool -l ice0
> > Channel parameters for ice0:
> > Pre-set maximums:
> > RX: 16
> > TX: 16
> > Other: 1
> > Combined: 16
> > Current hardware settings:
> > RX: 0
> > TX: 0
> > Other: 1
> > Combined: 4
> > Robin Cowley
> > Software Engineer
> > The Hut Group<http://www.thehutgroup.com/>
> > 
> > Tel:
> > Email: Robin.Cowley@xxxxxxxxxxxxxxx<mailto:Robin.Cowley@xxxxxxxxxxxxxxx>
> > 
> > For the purposes of this email, the "company" means The Hut Group Limited, a company registered in England and Wales (company number 6539496) whose registered office is at Fifth Floor, Voyager House, Chicago Avenue, Manchester Airport, M90 3DQ and/or any of its respective subsidiaries.
> > 
> > Confidentiality Notice
> > This e-mail is confidential and intended for the use of the named recipient only. If you are not the intended recipient please notify us by telephone immediately on +44(0)1606 811888 or return it to us by e-mail. Please then delete it from your system and note that any use, dissemination, forwarding, printing or copying is strictly prohibited. Any views or opinions are solely those of the author and do not necessarily represent those of the company.
> > 
> > Encryptions and Viruses
> > Please note that this e-mail and any attachments have not been encrypted. They may therefore be liable to be compromised. Please also note that it is your responsibility to scan this e-mail and any attachments for viruses. We do not, to the extent permitted by law, accept any liability (whether in contract, negligence or otherwise) for any virus infection and/or external compromise of security and/or confidentiality in relation to transmissions sent by e-mail.
> > 
> > Monitoring
> > Activity and use of the company's systems is monitored to secure its effective use and operation and for other lawful business purposes. Communications using these systems will also be monitored and may be recorded to secure effective use and operation and for other lawful business purposes.
> > 
> > hgvyjuv



[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux