Re: [PATCH net-next 0/1] mlx5: Add netdev-genl queue stats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 09, 2024 at 12:40:01AM +0300, Tariq Toukan wrote:
> 
> 
> On 06/05/2024 21:04, Joe Damato wrote:
> > On Fri, May 03, 2024 at 05:34:29PM -0700, Jakub Kicinski wrote:
> > > On Fri, 3 May 2024 16:53:40 -0700 Joe Damato wrote:
> > > > > diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> > > > > index c7ac4539eafc..f5d9f3ad5b66 100644
> > > > > --- a/include/net/netdev_queues.h
> > > > > +++ b/include/net/netdev_queues.h
> > > > > @@ -59,6 +59,8 @@ struct netdev_queue_stats_tx {
> > > > >    * statistics will not generally add up to the total number of events for
> > > > >    * the device. The @get_base_stats callback allows filling in the delta
> > > > >    * between events for currently live queues and overall device history.
> > > > > + * @get_base_stats can also be used to report any miscellaneous packets
> > > > > + * transferred outside of the main set of queues used by the networking stack.
> > > > >    * When the statistics for the entire device are queried, first @get_base_stats
> > > > >    * is issued to collect the delta, and then a series of per-queue callbacks.
> > > > >    * Only statistics which are set in @get_base_stats will be reported
> > > > > 
> > > > > 
> > > > > SG?
> > > > 
> > > > I think that sounds good and makes sense, yea. By that definition, then I
> > > > should leave the PTP stats as shown above. If you agree, I'll add that
> > > > to the v2.
> > > 
> > > Yup, agreed.
> > > 
> > > > I feel like I should probably wait before sending a v2 with PTP included in
> > > > get_base_stats to see if the Mellanox folks have any hints about why rtnl
> > > > != queue stats on mlx5?
> > > > 
> > > > What do you think?
> > > 
> > > Very odd, the code doesn't appear to be doing any magic :S Did you try
> > > to print what the delta in values is? Does bringing the interface up and
> > > down affect the size of it?
> > 
> > I booted the kernel which includes PTP stats in the base stats as you've
> > suggested (as shown in the diff in this thread) and I've brought the
> > interface down and back up:
> > 
> > $ sudo ip link set dev eth0 down
> > $ sudo ip link set dev eth0 up
> > 
> > Re ran the test script, which includes some mild debugging print out I
> > added to show the delta for rx-packets (but I think all stats are off):
> > 
> >    # Exception| Exception: Qstats are lower, fetched later
> > 
> > key: rx-packets rstat: 1192281902 qstat: 1186755777
> > key: rx-packets rstat: 1192281902 qstat: 1186755781
> > 
> > So qstat is lower by (1192281902 - 1186755781) = 5,526,121
> > 
> > Not really sure why, but I'll take another look at the code this morning to
> > see if I can figure out what's going on.
> > 
> > I'm clearly doing something wrong or misunderstanding something about the
> > accounting that will seem extremely obvious in retrospect.
> 
> Hi Joe,
> 
> Thanks for your patch.
> Apologies for the late response. I was on PTO for some time.

No worries, I hope you enjoyed your PTO. I appreciate your response, time,
and energy.

> From first look the patch looks okay. The overall approach seems correct.

Sounds good to me!
 
> The off-channels queues (like PTP) do not exist in default. So they are out
> of the game unless you explicitly enables them.

I did not enable them, but if you saw the thread, it sounds like Jakub's
preference is that in the v2 I include the PTP stats in get_base_stats.

Are you OK with that?
Are there other queue stats I should include as well?

> A possible reason for this difference is the queues included in the sum.
> Our stats are persistent across configuration changes, so they doesn't reset
> when number of channels changes for example.
> 
> We keep stats entries for al ring indices that ever existed. Our driver
> loops and sums up the stats for all of them, while the stack loops only up
> to the current netdev->real_num_rx_queues.
> 
> Can this explain the diff here?

Yes, that was it. Sorry I didn't realize this case. My lab machine runs a
script to adjust the queue count shortly after booting.

I disabled that and re-ran:

  NETIF=eth0 tools/testing/selftests/drivers/net/stats.py

and all tests pass.




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux