Re: Dropped packets mapping IRQs for adjusted queue counts on i40e

Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> · Wed, 5 May 2021 23:21:57 +0200

On Wed, May 05, 2021 at 01:01:28PM -0700, Jesse Brandeburg wrote:
> Zvi Effron wrote:
> 
> > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@xxxxxxxxxxxxx> wrote:
> > > I'm suspecting it's something with how XDP_REDIRECT is implemented in
> > > the i40e driver, but I don't know if this is a) cross driver behavior,
> > > b) expected behavior, or c) a bug.
> > I think I've found the issue, and it appears to be specific to i40e
> > (and maybe other drivers, too, but not XDP itself).
> > 
> > When performing the XDP xmit, i40e uses the smp_processor_id() to
> > select the tx queue (see
> > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846).
> > I'm not 100% clear on how the CPU is selected (since we don't use
> > cores 0 and 1), we end up on a core whose id is higher than any
> > available queue.
> > 
> > I'm going to try to modify our IRQ mappings to test this.
> > 
> > If I'm correct, this feels like a bug to me, since it requires a user
> > to understand low level driver details to do IRQ remapping, which is a
> > bit higher level. But if it's intended, we'll just have to figure out
> > how to work around this. (Unfortunately, using split tx and rx queues
> > is not possible with i40e, so that easy solution is unavailable.)
> > 
> > --Zvi

Hey Zvi, sorry for the lack of assistance, there has been statutory free
time in Poland and today i'm in the birthday mode, but we managed to
discuss the issue with Magnus and we feel like we could have a solution
for that, more below.

> 
> 
> It seems like for Intel drivers, igc, ixgbe, i40e, ice all have
> this problem.
> 
> Notably, igb, fixes it like I would expect.

igb is correct but I think that we would like to avoid the introduction of
locking for higher speed NICs in XDP data path.

We talked with Magnus that for i40e and ice that have lots of HW
resources, we could always create the xdp_rings array of num_online_cpus()
size and use smp_processor_id() for accesses, regardless of the user's
changes to queue count.

This way the smp_processor_id() provides the serialization by itself as
we're under napi on a given cpu, so there's no need for locking
introduction - there is a per-cpu XDP ring provided. If we would stick to
the approach where you adjust the size of xdp_rings down to the shrinked
Rx queue count and use a smp_processor_id() % vsi->num_queue_pairs formula
then we could have a resource contention. Say that you did on a 16 core
system:
$ ethtool -L eth0 combined 2

and then mapped the q0 to cpu1 and q1 to cpu 11. Both queues will grab the
xdp_rings[1], so we would have to introduce the locking.

Proposed approach would just result with more Tx queues packed onto Tx
ring container of queue vector.

Thoughts? Any concerns? Should we have a 'fallback' mode if we would be
out of queues?

Currently I have a patch for ice (which is trivial) that does the thing
described above and I'm in the middle of testing it. I feel like this is
also addressing Jakub's input on the old patch that tried to resolve the
issue around that manner:

https://lore.kernel.org/netdev/20210123195219.55f6d4e1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

> 
> Let's talk about it over on intel-wired-lan and cc netdev.

Let's do this now.

> 
> Jesse