Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Fri, 26 Nov 2021 19:47:17 +0100

Jakub Kicinski <kuba@xxxxxxxxxx> writes:

> On Fri, 26 Nov 2021 13:30:16 +0100 Toke Høiland-Jørgensen wrote:
>> >> TBH I wasn't following this thread too closely since I saw Daniel
>> >> nacked it already. I do prefer rtnl xstats, I'd just report them 
>> >> in -s if they are non-zero. But doesn't sound like we have an agreement
>> >> whether they should exist or not.  
>> >
>> > Right, just -s is fine, if we drop the per-channel approach.  
>> 
>> I agree that adding them to -s is fine (and that resolves my "no one
>> will find them" complain as well). If it crowds the output we could also
>> default to only output'ing a subset, and have the more detailed
>> statistics hidden behind a verbose switch (or even just in the JSON
>> output)?
>> 
>> >> Can we think of an approach which would make cloudflare and cilium
>> >> happy? Feels like we're trying to make the slightly hypothetical 
>> >> admin happy while ignoring objections of very real users.  
>> >
>> > The initial idea was to only uniform the drivers. But in general
>> > you are right, 10 drivers having something doesn't mean it's
>> > something good.  
>> 
>> I don't think it's accurate to call the admin use case "hypothetical".
>> We're expending a significant effort explaining to people that XDP can
>> "eat" your packets, and not having any standard statistics makes this
>> way harder. We should absolutely cater to our "early adopters", but if
>> we want XDP to see wider adoption, making it "less weird" is critical!
>
> Fair. In all honesty I said that hoping to push for a more flexible
> approach hidden entirely in BPF, and not involving driver changes.
> Assuming the XDP program has more fine grained stats we should be able
> to extract those instead of double-counting. Hence my vague "let's work
> with apps" comment.
>
> For example to a person familiar with the workload it'd be useful to
> know if program returned XDP_DROP because of configured policy or
> failure to parse a packet. I don't think that sort distinction is
> achievable at the level of standard stats.
>
> The information required by the admin is higher level. As you say the
> primary concern there is "how many packets did XDP eat".

Right, sure, I am also totally fine with having only a somewhat
restricted subset of stats available at the interface level and make
everything else be BPF-based. I'm hoping we can converge of a common
understanding of what this "minimal set" should be :)

> Speaking of which, one thing that badly needs clarification is our
> expectation around XDP packets getting counted towards the interface
> stats.

Agreed. My immediate thought is that "XDP packets are interface packets"
but that is certainly not what we do today, so not sure if changing it
at this point would break things?

>> > Maciej, I think you were talking about Cilium asking for those stats
>> > in Intel drivers? Could you maybe provide their exact usecases/needs
>> > so I'll orient myself? I certainly remember about XSK Tx packets and
>> > bytes.
>> > And speaking of XSK Tx, we have per-socket stats, isn't that enough?  
>> 
>> IMO, as long as the packets are accounted for in the regular XDP stats,
>> having a whole separate set of stats only for XSK is less important.
>> 
>> >> Please leave the per-channel stats out. They make a precedent for
>> >> channel stats which should be an attribute of a channel. Working for 
>> >> a large XDP user for a couple of years now I can tell you from my own
>> >> experience I've not once found them useful. In fact per-queue stats are
>> >> a major PITA as they crowd the output.  
>> >
>> > Oh okay. My very first iterations were without this, but then I
>> > found most of the drivers expose their XDP stats per-channel. Since
>> > I didn't plan to degrade the functionality, they went that way.  
>> 
>> I personally find the per-channel stats quite useful. One of the primary
>> reasons for not achieving full performance with XDP is broken
>> configuration of packet steering to CPUs, and having per-channel stats
>> is a nice way of seeing this.
>
> Right, that's about the only thing I use it for as well. "Is the load
> evenly distributed?"  But that's not XDP specific and not worth
> standardizing for, yet, IMO, because..
>
>> I can see the point about them being way too verbose in the default
>> output, though, and I do generally filter the output as well when
>> viewing them. But see my point above about only printing a subset of
>> the stats by default; per-channel stats could be JSON-only, for
>> instance?
>
> we don't even know what constitutes a channel today. And that will
> become increasingly problematic as importance of application specific
> queues increases (zctap etc). IMO until the ontological gaps around
> queues are filled we should leave per-queue stats in ethtool -S.

Hmm, right, I see. I suppose that as long as the XDP packets show up in
one of the interface counters in ethtool -S, it's possible to answer the
load distribution issue, and any further debugging (say, XDP drops on a
certain queue due to CPU-based queue indexing on TX) can be delegated to
BPF-based tools...

-Toke