On 02/06/2024 22:22, Joe Damato wrote:
On Sun, Jun 02, 2024 at 12:14:21PM +0300, Tariq Toukan wrote:
On 29/05/2024 6:16, Joe Damato wrote:
Add functions to support the netdev-genl per queue stats API.
./cli.py --spec netlink/specs/netdev.yaml \
--dump qstats-get --json '{"scope": "queue"}'
...snip
{'ifindex': 7,
'queue-id': 62,
'queue-type': 'rx',
'rx-alloc-fail': 0,
'rx-bytes': 105965251,
'rx-packets': 179790},
{'ifindex': 7,
'queue-id': 0,
'queue-type': 'tx',
'tx-bytes': 9402665,
'tx-packets': 17551},
...snip
Also tested with the script tools/testing/selftests/drivers/net/stats.py
in several scenarios to ensure stats tallying was correct:
- on boot (default queue counts)
- adjusting queue count up or down (ethtool -L eth0 combined ...)
- adding mqprio TCs
Please test also with interface down.
OK. I'll test with the interface down.
Is there some publicly available Mellanox script I can run to test
all the different cases? That would make this much easier. Maybe
this is something to include in mlnx-tools on github?
You're testing some new functionality. We don't have something for it.
The mlnx-tools scripts that includes some python scripts for setting
up QoS doesn't seem to work on my system, and outputs vague error
messages. I have no idea if I'm missing some kernel option, if the
device doesn't support it, or if I need some other dependency
installed.
Can you share the command you use, and the output?
I have been testing these patches on a:
Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
firmware-version: 16.29.2002 (MT_0000000013)
Signed-off-by: Joe Damato <jdamato@xxxxxxxxxx>
---
.../net/ethernet/mellanox/mlx5/core/en_main.c | 132 ++++++++++++++++++
1 file changed, 132 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ce15805ad55a..515c16a88a6c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -39,6 +39,7 @@
#include <linux/debugfs.h>
#include <linux/if_bridge.h>
#include <linux/filter.h>
+#include <net/netdev_queues.h>
#include <net/page_pool/types.h>
#include <net/pkt_sched.h>
#include <net/xdp_sock_drv.h>
@@ -5293,6 +5294,136 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
}
+static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
+ struct netdev_queue_stats_rx *stats)
+{
+ struct mlx5e_priv *priv = netdev_priv(dev);
+ struct mlx5e_channel_stats *channel_stats;
+ struct mlx5e_rq_stats *xskrq_stats;
+ struct mlx5e_rq_stats *rq_stats;
+
+ if (mlx5e_is_uplink_rep(priv))
+ return;
+
+ channel_stats = priv->channel_stats[i];
+ xskrq_stats = &channel_stats->xskrq;
+ rq_stats = &channel_stats->rq;
+
+ stats->packets = rq_stats->packets + xskrq_stats->packets;
+ stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
+ stats->alloc_fail = rq_stats->buff_alloc_err +
+ xskrq_stats->buff_alloc_err;
+}
+
+static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
+ struct netdev_queue_stats_tx *stats)
+{
+ struct mlx5e_priv *priv = netdev_priv(dev);
+ struct mlx5e_channel_stats *channel_stats;
+ struct mlx5e_sq_stats *sq_stats;
+ int ch_ix, tc_ix;
+
+ mutex_lock(&priv->state_lock);
+ txq_ix_to_chtc_ix(&priv->channels.params, i, &ch_ix, &tc_ix);
+ mutex_unlock(&priv->state_lock);
+
+ channel_stats = priv->channel_stats[ch_ix];
+ sq_stats = &channel_stats->sq[tc_ix];
+
+ stats->packets = sq_stats->packets;
+ stats->bytes = sq_stats->bytes;
+}
+
+static void mlx5e_get_base_stats(struct net_device *dev,
+ struct netdev_queue_stats_rx *rx,
+ struct netdev_queue_stats_tx *tx)
+{
+ struct mlx5e_priv *priv = netdev_priv(dev);
+ int i, j;
+
+ if (!mlx5e_is_uplink_rep(priv)) {
+ rx->packets = 0;
+ rx->bytes = 0;
+ rx->alloc_fail = 0;
+
+ /* compute stats for deactivated RX queues
+ *
+ * if priv->channels.num == 0 the device is down, so compute
+ * stats for every queue.
+ *
+ * otherwise, compute only the queues which have been deactivated.
+ */
+ mutex_lock(&priv->state_lock);
+ if (priv->channels.num == 0)
+ i = 0;
This is not consistent with the above implementation of
mlx5e_get_queue_stats_rx(), which always returns the stats even if the
channel is down.
This way, you'll double count the down channels.
I think you should always start from priv->channels.params.num_channels.
OK, I'll do that.
+ else
+ i = priv->channels.params.num_channels;
+ mutex_unlock(&priv->state_lock);
I understand that you're following the guidelines by taking the lock here, I
just don't think this improves anything... If channels can be modified in
between calls to mlx5e_get_base_stats / mlx5e_get_queue_stats_rx, then
wrapping the priv->channels access with a lock can help protect each single
deref, but not necessarily in giving a consistent "screenshot" of the stats.
The rtnl_lock should take care of that, as the driver holds it when changing
the number of channels and updating the real_numrx/tx_queues.
This said, I would carefully say you can drop the mutex once following the
requested changes above.
I still don't really like this design, so I gave some more thought on
this...
I think we should come up with a new mapping array under priv, that maps
i (from real_num_tx_queues) to the matching sq_stats struct.
This array would be maintained in the channels open/close functions,
similarly to priv->txq2sq.
Then, we would not calculate the mapping per call, but just get the
proper pointer from the array. This eases the handling of htb and ptp
queues, which were missed in your txq_ix_to_chtc_ix().
This handles mapped SQs.
Now, regarding unmapped ones, they must be handled in the "base"
function call.
We'd still need to access channels->params, to:
1. read params.num_channels to iterate until priv->stats_nch, and
2. read mlx5e_get_dcb_num_tc(params) to iterate until priv->max_opened_tc.
I think we can live with this without holding the mutex, given that this
runs under the rtnl lock.
We can add ASSERT_RTNL() to verify the assumption.
OK, that makes sense to me.
So then I assume I can drop the mutex in mlx5e_get_queue_stats_tx
above, as well, for the same reasons?
Does this mean then that you are in favor of the implementation for
tx stats provided in this RFC and that I've implemented option 1 as
you described in the previous thread correctly?
Yes, but I wasn't happy enough with the design.
Thanks for your contribution.
+
+ for (; i < priv->stats_nch; i++) {
+ struct netdev_queue_stats_rx rx_i = {0};
+
+ mlx5e_get_queue_stats_rx(dev, i, &rx_i);
+
+ rx->packets += rx_i.packets;
+ rx->bytes += rx_i.bytes;
+ rx->alloc_fail += rx_i.alloc_fail;
+ }
+
+ if (priv->rx_ptp_opened) {
+ struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
+
+ rx->packets += rq_stats->packets;
+ rx->bytes += rq_stats->bytes;
+ }
+ }
+
+ tx->packets = 0;
+ tx->bytes = 0;
+
+ mutex_lock(&priv->state_lock);
+ for (i = 0; i < priv->stats_nch; i++) {
+ struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
+
+ /* while iterating through all channels [0, stats_nch], there
+ * are two cases to handle:
+ *
+ * 1. the channel is available, so sum only the unavailable TCs
+ * [mlx5e_get_dcb_num_tc, max_opened_tc).
+ *
+ * 2. the channel is unavailable, so sum all TCs [0, max_opened_tc).
+ */
I wonder why not call the local var 'tc'?
OK.
+ if (i < priv->channels.params.num_channels) {
+ j = mlx5e_get_dcb_num_tc(&priv->channels.params);
+ } else {
+ j = 0;
+ }
Remove parenthesis, or use ternary op.
I'll remove the parenthesis; I didn't run checkpatch.pl on this RFC
(which catches this), but I should have.
+
+ for (; j < priv->max_opened_tc; j++) {
+ struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[j];
+
+ tx->packets += sq_stats->packets;
+ tx->bytes += sq_stats->bytes;
+ }
+ }
+ mutex_unlock(&priv->state_lock);
+
Same comment regarding dropping the mutex.
OK.