Re: [PATCH intel-next 2/2] ice: introduce XDP Tx fallback path

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Tue, 08 Jun 2021 23:43:40 +0200

Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> writes:

> On Tue, Jun 01, 2021 at 02:38:03PM +0200, Toke Høiland-Jørgensen wrote:
>> Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> writes:
>> 
>> > Under rare circumstances there might be a situation where a requirement
>> > of having a XDP Tx queue per core could not be fulfilled and some of the
>> > Tx resources would have to be shared between cores. This yields a need
>> > for placing accesses to xdp_rings array onto critical section protected
>> > by spinlock.
>> >
>> > Design of handling such scenario is to at first find out how many queues
>> > are there that XDP could use. Any number that is not less than the half
>> > of a count of cores of platform is allowed. XDP queue count < cpu count
>> > is signalled via new VSI state ICE_VSI_XDP_FALLBACK which carries the
>> > information further down to Rx rings where new ICE_TX_XDP_LOCKED is set
>> > based on the mentioned VSI state. This ring flag indicates that locking
>> > variants for getting/putting xdp_ring need to be used in fast path.
>> >
>> > For XDP_REDIRECT the impact on standard case (one XDP ring per CPU) can
>> > be reduced a bit by providing a separate ndo_xdp_xmit and swap it at
>> > configuration time. However, due to the fact that net_device_ops struct
>> > is a const, it is not possible to replace a single ndo, so for the
>> > locking variant of ndo_xdp_xmit, whole net_device_ops needs to be
>> > replayed.
>> >
>> > It has an impact on performance (1-2 %) of a non-fallback path as
>> > branches are introduced.
>> 
>> I generally feel this is the right approach, although the performance
>> impact is a bit unfortunately, obviously. Maybe it could be avoided by
>> the use of static_branch? I.e., keep a global refcount of how many
>> netdevs are using the locked path and only activate the check in the
>> fast path while that refcount is >0?
>
> This would be an ideal solution if we would be able to have it PF-scoped,
> which AFAICT is not possible as static key is per module, right?

Yeah, static_branch basically patches the kernel text when activated
(hence the low overhead), so it's a global switch...

> I checked that before the bank holiday here in Poland and indeed I was not
> observing perf drops. Only thing that is questionable is the fact that a
> single PF would affect all the others that ice driver is serving.
>
> OTOH I see that Jesper acked that work.
>
> Let me play with this a bit more as I'm in the middle of switching my HW
> lab, but I wanted to break the silence over here. I didn't manage to check
> that one fallback path will affect other PFs.
>
> Thanks Toke for that great idea :) any other opinions are more than
> welcome.

You're welcome! :)

-Toke