On Sat, Feb 3, 2024 at 12:46 AM Daniel Jurgens <danielj@xxxxxxxxxx> wrote: > > > From: Jakub Kicinski <kuba@xxxxxxxxxx> > > Sent: Friday, February 2, 2024 10:01 AM > > Subject: Re: [PATCH net-next] virtio_net: Add TX stop and wake counters > > > > On Fri, 2 Feb 2024 14:52:59 +0800 Jason Xing wrote: > > > > Can you say more? I'm curious what's your use case. > > > > > > I'm not working at Nvidia, so my point of view may differ from theirs. > > > From what I can tell is that those two counters help me narrow down > > > the range if I have to diagnose/debug some issues. > > > > right, i'm asking to collect useful debugging tricks, nothing against the patch > > itself :) > > > > > 1) I sometimes notice that if some irq is held too long (say, one > > > simple case: output of printk printed to the console), those two > > > counters can reflect the issue. > > > 2) Similarly in virtio net, recently I traced such counters the > > > current kernel does not have and it turned out that one of the output > > > queues in the backend behaves badly. > > > ... > > > > > > Stop/wake queue counters may not show directly the root cause of the > > > issue, but help us 'guess' to some extent. > > > > I'm surprised you say you can detect stall-related issues with this. > > I guess virtio doesn't have BQL support, which makes it special. > > Normal HW drivers with BQL almost never stop the queue by themselves. > > I mean - if they do, and BQL is active, then the system is probably > > misconfigured (queue is too short). This is what we use at Meta to detect > > stalls in drivers with BQL: > > > > https://lore.kernel.org/all/20240131102150.728960-3-leitao@xxxxxxxxxx/ > > > > Daniel, I think this may be a good enough excuse to add per-queue stats to > > the netdev genl family, if you're up for that. LMK if you want more info, > > otherwise I guess ethtool -S is fine for now. > > For now, I would like to go with ethtool -S. But I can take on the netdev level as a background task. Are you suggesting adding them to rtnl_link_stats64? Hello Daniel, Jakub, Sorry to revive this thread. I wonder why not use this patch like mlnx driver does instead of adding statistics into the yaml file? Are we gradually using or adding more fields into the yaml file to replace the 'ethtool -S' command? Thanks, Jason