On 08/09/2021 17.57, Jakub Kicinski wrote:
On Wed, 8 Sep 2021 18:26:35 +0300 Ilias Apalodimas wrote:
Normally I'd say put the stats in ethtool -S and the rest in debugfs
but I'm not sure if exposing pages_state_hold_cnt and
pages_state_release_cnt directly. Those are short counters, and will
very likely wrap. They are primarily meaningful for calculating
page_pool_inflight(). Given this I think their semantics may be too
confusing for an average ethtool -S user.
Putting all the information in debugfs seems like a better idea.
I can't really disagree on the aforementioned stats being confusing.
However at some point we'll want to add more useful page_pool stats (e.g the
percentage of the page/page fragments that are hitting the recycling path).
Would it still be 'ok' to have info split across ethtool and debugfs?
Possibly. We'll also see what Alex L comes up with for XDP stats. Maybe
we can arrive at a netlink API for standard things (broken record).
You said percentage - even tho I personally don't like it - there is a
small precedent of ethtool -S containing non-counter information (IOW
not monotonically increasing event counters), e.g. some vendors rammed
PCI link quality in there. So if all else fails ethtool -S should be
fine.
I agree with Ilias, that we ought-to add some page_pool stats.
*BUT* ONLY if this doesn't hurt performance!!!
We have explained before, how this is possible, e.g. by keeping consumer
vs. producer counters on separate cache-lines (internally in page_pool
struct and likely on per CPU for returning pages). Then the drivers
ethtool functions can request the page_pool to fillout a driver provided
stats area, such that the collection and aggregation of counters are not
on the fast-path.
I definitely don't want to see pages_state_hold_cnt and
pages_state_release_cnt being exposed directly. These were carefully
designed to not hurt performance. An inflight counter can be deducted by
above ethtool-driver step and presented to userspace.
Notice that while developing page_pool, I've been using tracepoints and
bpftrace scripts to inspect the behavior and internals of page_pool.
See[1] and I've even written a page leak detector[2].
In principle you could write a bpftrace tool that extract stats, the
same way. But I would only recommend doing this for devel phase, because
these tracepoints do add some overhead.
Originally I wanted to push people to use this for stats, but I've
realized that not having these stats easy available is annoying ;-)
-Jesper
[1]
https://github.com/xdp-project/xdp-project/tree/master/areas/mem/bpftrace
[2]
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftrace/page_pool_track_leaks02.bt