Federico Parola <federico.parola@xxxxxxxxx> wrote: >On 17/03/22 15:59, Jay Vosburgh wrote: >> Federico Parola <federico.parola@xxxxxxxxx> wrote: >> >>> Hello everybody, >>> I'm experiencing a strange problem when running an AF_XDP application with >>> busy poll enabled on a Intel XL710 NIC (i40e driver). >>> The problem can be replicated running the xdpsock kernel sample in rx or >>> l2fwd mode. >>> The first packet I send to the machine is correctly received by the >>> application. After this, packets are only received in batches of 8. >>> If I send 7 packets the application sees nothing, while the 8th one >>> triggers the reception of all 8 packets. >>> Disabling the busy poll mode everything works fine and packets are >>> immediately received as they are sent. >>> >>> I tried changing kernel (5.12, 5.14 and 5.16) but all present the same >>> problem. >>> I also tried using another NIC, an Intel X540 with ixgbe driver and the >>> problem isn't there, so I guess is NIC/driver related. >>> >>> I tried monitoring ethtool statistics. When sending packets between 1 and >>> 7 these counters are increased: >>> stat: 64 ( 64) <= port.rx_bytes /sec >>> stat: 1 ( 1) <= port.rx_size_64 /sec >>> stat: 1 ( 1) <= port.rx_unicast /sec >>> stat: 1 ( 1) <= rx_unicast /sec >>> >>> While the 8th one triggers this updates: >>> stat: 64 ( 64) <= port.rx_bytes /sec >>> stat: 1 ( 1) <= port.rx_size_64 /sec >>> stat: 1 ( 1) <= port.rx_unicast /sec >>> stat: 477 ( 477) <= rx-0.bytes /sec >>> stat: 8 ( 8) <= rx-0.packets /sec >>> stat: 477 ( 477) <= rx_bytes /sec >>> stat: 8 ( 8) <= rx_packets /sec >>> stat: 1 ( 1) <= rx_unicast /sec >>> >>> As far as I understand the first set of counters are hardware counters, so >>> it makes me think that packets are kept in the NIC and not even sent to >>> memory. >>> >>> Does anyone have any suggestion on what could be causing this problem? >>> Does enabling busy poll set some flag on the NIC? >> We observed similar "batching" behavior on i40e devices late >> last year in ordinary use (not XDP, but using SR-IOV VFs). We >> instrumented the drivers at the send and receive sides, and determined >> that it appeared to be a behavior of the receiving device itself, i.e., >> packets 1 - 7 would be held indefinitely (as I recall, no interrupt or >> update of the RX ring pointers) until packet 8 arrives, at which point >> all 8 were delivered simultaneously. >> The issue was evidently in the firmware, and was resolved after >> a firmware upgrade. > >Hi Jay, >I just updated the firmware to the latest version (v8.50 from v8.30) but >unfortunately the problem is still there. >However I'm experiencing the problem only when using AF_XDP in busy poll >mode, all other modes (standard AF_XDP and normal packet reception) work >just fine. >Maybe the two problems are correlated in some way. I don't have the firmware release versions for the issue we looked into (it was at a customer of ours), but the fixed firmware was provided by HPE/Intel via a support ticket. Not all devices exhibited the problem, even with identical firmware; the end solution was to either apply the vendor's new firmware or replace the cards showing the problem. I also don't recall if the fix was the firmware, the NVM image, or both (at probe time, the i40e driver prints firmware, api and nvm versions into the kernel dmesg log). We saw packet "batches" of both 4 and 8. Nevertheless, given the very specific "batching packets into groups of N" behavior, it seems unlikely to be an unrelated problem. -J --- -Jay Vosburgh, jay.vosburgh@xxxxxxxxxxxxx