Re: [PATCH v1 1/2] vdpa: Add support for querying vendor statistics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/13/2022 8:26 AM, Eli Cohen wrote:
On 3/8/2022 9:07 PM, Eli Cohen wrote:
-----Original Message-----
From: Si-Wei Liu <si-wei.liu@xxxxxxxxxx>
Sent: Wednesday, March 9, 2022 5:33 AM
To: Eli Cohen <elic@xxxxxxxxxx>
Cc: mst@xxxxxxxxxx; jasowang@xxxxxxxxxx; virtualization@lists.linux-
foundation.org; eperezma@xxxxxxxxxx; amorenoz@xxxxxxxxxx;
lvivier@xxxxxxxxxx; sgarzare@xxxxxxxxxx; Parav Pandit <parav@xxxxxxxxxx>
Subject: Re: [PATCH v1 1/2] vdpa: Add support for querying vendor statistics



On 3/8/2022 6:13 AM, Eli Cohen wrote:
-----Original Message-----
From: Si-Wei Liu <si-wei.liu@xxxxxxxxxx>
Sent: Tuesday, March 8, 2022 8:16 AM
To: Eli Cohen <elic@xxxxxxxxxx>
Cc: mst@xxxxxxxxxx; jasowang@xxxxxxxxxx; virtualization@lists.linux-
foundation.org; eperezma@xxxxxxxxxx; amorenoz@xxxxxxxxxx;
lvivier@xxxxxxxxxx; sgarzare@xxxxxxxxxx; Parav Pandit
<parav@xxxxxxxxxx>
Subject: Re: [PATCH v1 1/2] vdpa: Add support for querying vendor
statistics



On 3/6/2022 11:57 PM, Eli Cohen wrote:
-----Original Message-----
From: Si-Wei Liu <si-wei.liu@xxxxxxxxxx>
Sent: Saturday, March 5, 2022 12:34 AM
To: Eli Cohen <elic@xxxxxxxxxx>
Cc: mst@xxxxxxxxxx; jasowang@xxxxxxxxxx;
virtualization@lists.linux- foundation.org; eperezma@xxxxxxxxxx;
amorenoz@xxxxxxxxxx; lvivier@xxxxxxxxxx; sgarzare@xxxxxxxxxx;
Parav
Pandit <parav@xxxxxxxxxx>
Subject: Re: [PATCH v1 1/2] vdpa: Add support for querying vendor
statistics

Sorry, I somehow missed this after my break. Please see comments in
line.
On 2/16/2022 10:46 PM, Eli Cohen wrote:
On Wed, Feb 16, 2022 at 10:49:26AM -0800, Si-Wei Liu wrote:
On 2/16/2022 12:00 AM, Eli Cohen wrote:
Allows to read vendor statistics of a vdpa device. The specific
statistics data is received by the upstream driver in the form
of an (attribute name, attribute value) pairs.

An example of statistics for mlx5_vdpa device are:

received_desc - number of descriptors received by the virtqueue
completed_desc - number of descriptors completed by the
virtqueue

A descriptor using indirect buffers is still counted as 1. In
addition, N chained descriptors are counted correctly N times as
one
would expect.
A new callback was added to vdpa_config_ops which provides the
means for the vdpa driver to return statistics results.

The interface allows for reading all the supported virtqueues,
including the control virtqueue if it exists.

Below are some examples taken from mlx5_vdpa which are
introduced in the following patch:

1. Read statistics for the virtqueue at index 1

$ vdpa dev vstats show vdpa-a qidx 1
vdpa-a:
queue_type tx queue_index 1 received_desc 3844836
completed_desc
3844836

2. Read statistics for the virtqueue at index 32 $ vdpa dev
vstats show vdpa-a qidx 32
vdpa-a:
queue_type control_vq queue_index 32 received_desc 62
completed_desc
62

3. Read statisitics for the virtqueue at index 0 with json
output $ vdpa -j dev vstats show vdpa-a qidx 0
{"vstats":{"vdpa-a":{

"queue_type":"rx","queue_index":0,"name":"received_desc","value":41
77
76,\
       "name":"completed_desc","value":417548}}}

4. Read statistics for the virtqueue at index 0 with preety json
output $ vdpa -jp dev vstats show vdpa-a qidx 0 {
          "vstats": {
              "vdpa-a": {

                  "queue_type": "rx",
I wonder where this info can be inferred? I don't see relevant
change in the patch series that helps gather the
VDPA_ATTR_DEV_QUEUE_TYPE?
Is this an arbitrary string defined by the vendor as well? If so,
how does the user expect to consume it?
The queue tupe is deduced from the index and whether we have a
virtqueue. Even numbers are rx, odd numbers are tx and if there is
CVQ, the last one is CVQ.
OK, then VDPA_ATTR_DEV_QUEUE_TYPE attribute introduced in this
patch might not be useful at all?
Right, will remove.

And how do you determine in the vdpa tool if CVQ is negotiated or
not?
I make a netlink call to get the same information as " vdpa dev config
show"
retrieves. I use the negotiated features to determine if a CVQ is
available. If it is, the number of VQs equals the control VQ index.
So there are two netlink calls under the hood.
The lock vdpa_dev_mutex won't hold across the two separate netlink
calls, and it may end up with inconsistent state - theoretically
things could happen like that the first call gets CVQ negotiated, but
the later call for
get_vendor_vq_stats() on the cvq might get -EINVAL due to device
reset. Can the negotiated status and stat query be done within one single
netlink call?
I see your concern.
The only reason I do the extra call is to know if we have a control VQ
and what index it is, just to print a descriptive string telling if it's a either rx,
tx or control VQ.
So the cure can be simple. Let's have a new attribute that returns the
type of virtqueue.
I am not sure I follow the cure. Wouldn't it be possible to get both negotiated
status and the queue stat in vdpa_nl_cmd_dev_stats_get_doit() under the
same vdpa_dev_mutex lock?
Yes we can, but I suggested to get only the type of the queue as a new attribute.
The kernel will do the digest and decide per a given VQ if it's rx, tx or control and
return the result in that new attribute.
The rx, tx and control queue type is net specific, while the vdpa core
is currently agnostic to the vdpa class.

And I am not even sure if it is a must to display
the queue type - it doesn't seem the output includes the vdpa class info, which
makes it hard for script to parse the this field in generic way.
I don't get you. You say you don't think you need the queue type and at the same
time you're concerned lack of information makes it hard for scripts.
BTW, class info is something you can get for the device through "vdpa dev show"
so your know the class of your device.
Stepping back, may I ask if there's a case that queue type specific stat
may be defined by vendor, such that deciphering of certain vendor stat
would need type specific knowledge? So far the received_desc and
completed_desc stats offered through the mlx5_vdpa patch look to be
general ones and not associated with any queue type in particular. Is
there some future stat in your mind that needs specific knowledge of
queue type and vdpa class?
No, the only reason for displaying the queue type is to help users
know kind of queue they're looking at.

I'd prefer the vstat output to be self-contained and self-descriptive.
You may argue the class of vdpa never changes in "vdpa dev show" after
creation. This is true, however the queue type is not - say you got a
control queue for qindex 2, but the next moment you may get a rx queue
with the same qindex.
I don't think this is possible unless you destroyed the device and re-created it.
What operation do you think could cause that?
Say you got a vdpa net device created with 4 data queue pairs and a control vq. On boot some guest firmware may support just F_CTRL_VQ but not F_MQ, then the index for the control vq in guest ends up with 2, as in this case there's only a single queue pair enabled for rx (index 0) and tx (index 1). From the host driver (e.g. mlx5_vdpa) perspective, the control vq is the last vq following 8 data vqs of all 4 pairs, hence got the 8th index in the rank. Since F_MQ is not negotiated and only 1 data queue pair enabled, in such event only host qindex 0,1 and 8 have vendor stats available, and the rest of qindex would get invalid/empty stat.

Later on say boot continues towards loading the Linux virtio driver, then guest could successfully negotiate both F_CTRL_VQ and F_MQ features. In this case, all 8 data virtqueues are fully enabled, the index for the control vq ends up as 8, following tightly after all the 4 data queue pairs. Only until both features are negotiated, the guest and host are able to see consistent view in identifying the control vq. Since F_MQ is negotiated, all host queues, indexed from 0 through 8, should have vendor stats available.

That's why I said the guest qindex is ephemeral and hard to predict subjected to negotiated features, but host qindex is reliable and more eligible for command line identification purpose.


Particularly you seem want to tie this with queue
index in the guest view, which is quite dynamic for host admin or script
running on the host to follow.
For rx and tx queues, some index may become invalid if the user changed
the number of queues with ethtool -L but I don't think this is an issue.
This is irrelevant. Ethtool -L may only change the effective number of vqs in use (even if ending with one single queue pair), but would never flip feature negotiation around F_MQ.

    I think Jason did not like the idea of communicating the kind of VQ
from kernel to userspace but under these circumstances, maybe he would
approve.
Jason?

What worried me is that the queue index being dynamic and depended on
negotiation status would make host admin user quite hard to follow.
The guest may or may not advertise F_MQ and/or F_CTRL_VQ across
various phases, e.g.
firmware (UEFI), boot loader (grub) till OS driver is up and running,
which can be agnostic to host admin. For most of the part it's not
easy to script and predict the queue index which can change from time
to time. Can we define the order of host predictable queue index,
which is independent from any guest negotiated state?
Here I think we can just use the plain queue index in the host view - say if vdpa
net has 4 pairs of data vqs and 1 control vq, user may use qindex 8 across the
board to identify the control vq, regardless if the F_MQ feature is negotiated
or not in guest.
Right, but the idea that a userspace tool should provide useful information to the
user so it does not need to do complex logic to infer that from bare data.
The host side qindex and qtype would never change regardless of guest
feature negotiation, by nature it reflects the real construct and object
in the hardware. I don't feel it's a simple task for host users to
figure out the correct guest side qindex for the control queue -  it's
always racy for one to check some other vdpa command output if the vstat
output is not self-contained.
So what are you actually proposing? Display received and completed descriptors
per queue index without further interpretation?

I'd suggest using a more stable queue id i.e. the host queue index to represent the qidx (which seems to be what you're doing now?), and displaying both the host qindex (queue_index_device in the example below), as well as the guest's (queue_index_driver as below) in the output:

$ vdpa -jp dev vstats show vdpa-a qidx 8
{
    "vstats": {
        "vdpa-a": {
            "queue_stats": [{
                "queue_index_device": 8,
                "queue_index_driver": 2,
                "queue_type": "control_vq",
                "stat_name": [ "received_desc","completed_desc" ],
                "stat_value": [ 417776,417775 ],
            }]
        }
    }
}

Optionally, user may use guest queue index gqidx, which is kind of an ephemeral ID and F_MQ negotiation depended, to query the stat on a specific guest queue:

$ vdpa -jp dev vstats show vdpa-a gqidx 2
{
    "vstats": {
        "vdpa-a": {
            "queue_stats": [{
                "queue_index_device": 8,
                "queue_index_driver": 2,
                "queue_type": "control_vq",
                "stat_name": [ "received_desc","completed_desc" ],
                "stat_value": [ 417776,417775 ],
            }]
        }
    }
}

Thanks,
-Siwei


Thanks,
-Siwei

Regards,
-Siwei

Looks to me there are still some loose end I don't quite yet
understand.


                  "queue_index": 0,
I think this can be removed since the command is for a specific index.

                  "name": "received_desc",
                  "value": 417776,
                  "name": "completed_desc",
                  "value": 417548
Not for this kernel patch, but IMHO it's the best to put the name
& value pairs in an array instead of flat entries in json's
hash/dictionary. The hash entries can be re-ordered deliberately
by external json parsing tool, ending up with inconsistent stat values.
This comment is missed for some reason. Please change the example
in the log if you agree to address it in vdpa tool. Or justify why
keeping the order for json hash/dictionary is fine.
Sorry for skipping this comment.
Do you mean to present the information like:
"received_desc": 417776,
"completed_desc": 417548,
I mean the following presentation:

$ vdpa -jp dev vstats show vdpa-a qidx 0 {
        "vstats": {
            "vdpa-a": {
                "queue_stats": [{
                    "queue_index": 0,
                    "queue_type": "rx",
                    "stat_name": [ "received_desc","completed_desc" ],
                    "stat_value": [ 417776,417548 ],
                }]
            }
        }
}

I think Parav had similar suggestion, too.

Thanks,
-Siwei

Thanks,
-Siwei

Thanks,
-Siwei
              }
          }
}

Signed-off-by: Eli Cohen <elic@xxxxxxxxxx>
---
       drivers/vdpa/vdpa.c       | 129
++++++++++++++++++++++++++++++++++++++
       include/linux/vdpa.h      |   5 ++
       include/uapi/linux/vdpa.h |   7 +++
       3 files changed, 141 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
9846c9de4bfa..d0ff671baf88 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -909,6 +909,74 @@ vdpa_dev_config_fill(struct vdpa_device
*vdev,
struct sk_buff *msg, u32 portid,
       	return err;
       }
+static int vdpa_fill_stats_rec(struct vdpa_device *vdev, struct
+sk_buff
*msg,
+			       struct genl_info *info, u32 index) {
+	int err;
+
+	if (nla_put_u32(msg, VDPA_ATTR_DEV_QUEUE_INDEX, index))
+		return -EMSGSIZE;
+
+	err = vdev->config->get_vendor_vq_stats(vdev, index, msg,
+info-
extack);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int vendor_stats_fill(struct vdpa_device *vdev, struct
+sk_buff
*msg,
+			     struct genl_info *info, u32 index) {
+	int err;
+
+	if (!vdev->config->get_vendor_vq_stats)
+		return -EOPNOTSUPP;
+
+	err = vdpa_fill_stats_rec(vdev, msg, info, index);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int vdpa_dev_vendor_stats_fill(struct vdpa_device *vdev,
+				      struct sk_buff *msg,
+				      struct genl_info *info, u32 index) {
+	u32 device_id;
+	void *hdr;
+	int err;
+	u32 portid = info->snd_portid;
+	u32 seq = info->snd_seq;
+	u32 flags = 0;
+
+	hdr = genlmsg_put(msg, portid, seq, &vdpa_nl_family, flags,
+			  VDPA_CMD_DEV_VSTATS_GET);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (nla_put_string(msg, VDPA_ATTR_DEV_NAME,
dev_name(&vdev-
dev))) {
+		err = -EMSGSIZE;
+		goto undo_msg;
+	}
+
+	device_id = vdev->config->get_device_id(vdev);
+	if (nla_put_u32(msg, VDPA_ATTR_DEV_ID, device_id)) {
+		err = -EMSGSIZE;
+		goto undo_msg;
+	}
+
+	err = vendor_stats_fill(vdev, msg, info, index);
+
+	genlmsg_end(msg, hdr);
+
+	return err;
+
+undo_msg:
+	genlmsg_cancel(msg, hdr);
+	return err;
+}
+
       static int vdpa_nl_cmd_dev_config_get_doit(struct sk_buff
*skb, struct
genl_info *info)
       {
       	struct vdpa_device *vdev;
@@ -990,6 +1058,60 @@
vdpa_nl_cmd_dev_config_get_dumpit(struct
sk_buff *msg, struct netlink_callback *
       	return msg->len;
       }
+static int vdpa_nl_cmd_dev_stats_get_doit(struct sk_buff *skb,
+					  struct genl_info *info)
+{
+	struct vdpa_device *vdev;
+	struct sk_buff *msg;
+	const char *devname;
+	struct device *dev;
+	u32 index;
+	int err;
+
+	if (!info->attrs[VDPA_ATTR_DEV_NAME])
+		return -EINVAL;
+
+	if (!info->attrs[VDPA_ATTR_DEV_QUEUE_INDEX])
+		return -EINVAL;
+
+	devname = nla_data(info->attrs[VDPA_ATTR_DEV_NAME]);
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	index = nla_get_u32(info-
attrs[VDPA_ATTR_DEV_QUEUE_INDEX]);
+	mutex_lock(&vdpa_dev_mutex);
+	dev = bus_find_device(&vdpa_bus, NULL, devname,
vdpa_name_match);
+	if (!dev) {
+		NL_SET_ERR_MSG_MOD(info->extack, "device not
found");
+		err = -ENODEV;
+		goto dev_err;
+	}
+	vdev = container_of(dev, struct vdpa_device, dev);
+	if (!vdev->mdev) {
+		NL_SET_ERR_MSG_MOD(info->extack, "unmanaged
vdpa
device");
+		err = -EINVAL;
+		goto mdev_err;
+	}
+	err = vdpa_dev_vendor_stats_fill(vdev, msg, info, index);
+	if (!err)
+		err = genlmsg_reply(msg, info);
+
+	put_device(dev);
+	mutex_unlock(&vdpa_dev_mutex);
+
+	if (err)
+		nlmsg_free(msg);
+
+	return err;
+
+mdev_err:
+	put_device(dev);
+dev_err:
+	mutex_unlock(&vdpa_dev_mutex);
+	return err;
+}
+
       static const struct nla_policy vdpa_nl_policy[VDPA_ATTR_MAX + 1]
= {
       	[VDPA_ATTR_MGMTDEV_BUS_NAME] = { .type =
NLA_NUL_STRING },
       	[VDPA_ATTR_MGMTDEV_DEV_NAME] = { .type = NLA_STRING
}, @@ -
997,6
+1119,7 @@ static const struct nla_policy
vdpa_nl_policy[VDPA_ATTR_MAX + 1] = {
       	[VDPA_ATTR_DEV_NET_CFG_MACADDR] =
NLA_POLICY_ETH_ADDR,
       	/* virtio spec 1.1 section 5.1.4.1 for valid MTU range */
       	[VDPA_ATTR_DEV_NET_CFG_MTU] =
NLA_POLICY_MIN(NLA_U16, 68),
+	[VDPA_ATTR_DEV_QUEUE_INDEX] =
NLA_POLICY_RANGE(NLA_U32, 0,
65535),
       };
       static const struct genl_ops vdpa_nl_ops[] = { @@ -1030,6
+1153,12 @@ static const struct genl_ops vdpa_nl_ops[] = {
       		.doit = vdpa_nl_cmd_dev_config_get_doit,
       		.dumpit = vdpa_nl_cmd_dev_config_get_dumpit,
       	},
+	{
+		.cmd = VDPA_CMD_DEV_VSTATS_GET,
+		.validate = GENL_DONT_VALIDATE_STRICT |
GENL_DONT_VALIDATE_DUMP,
+		.doit = vdpa_nl_cmd_dev_stats_get_doit,
+		.flags = GENL_ADMIN_PERM,
+	},
       };
       static struct genl_family vdpa_nl_family __ro_after_init =
{ diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index
2de442ececae..274203845cfc 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -275,6 +275,9 @@ struct vdpa_config_ops {
       			    const struct vdpa_vq_state *state);
       	int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
       			    struct vdpa_vq_state *state);
+	int (*get_vendor_vq_stats)(struct vdpa_device *vdev, u16 idx,
+				   struct sk_buff *msg,
+				   struct netlink_ext_ack *extack);
       	struct vdpa_notification_area
       	(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
       	/* vq irq is not expected to be changed once DRIVER_OK is
set */ @@ -466,4 +469,6 @@ struct vdpa_mgmt_dev {
       int vdpa_mgmtdev_register(struct vdpa_mgmt_dev *mdev);
       void vdpa_mgmtdev_unregister(struct vdpa_mgmt_dev *mdev);
+#define VDPA_INVAL_QUEUE_INDEX 0xffff
+
       #endif /* _LINUX_VDPA_H */
diff --git a/include/uapi/linux/vdpa.h
b/include/uapi/linux/vdpa.h index 1061d8d2d09d..c5f229a41dc2
100644
--- a/include/uapi/linux/vdpa.h
+++ b/include/uapi/linux/vdpa.h
@@ -18,6 +18,7 @@ enum vdpa_command {
       	VDPA_CMD_DEV_DEL,
       	VDPA_CMD_DEV_GET,		/* can dump */
       	VDPA_CMD_DEV_CONFIG_GET,	/* can dump */
+	VDPA_CMD_DEV_VSTATS_GET,
       };
       enum vdpa_attr {
@@ -46,6 +47,12 @@ enum vdpa_attr {
       	VDPA_ATTR_DEV_NEGOTIATED_FEATURES,	/* u64 */
       	VDPA_ATTR_DEV_MGMTDEV_MAX_VQS,		/*
u32 */
       	VDPA_ATTR_DEV_SUPPORTED_FEATURES,	/* u64 */
+
+	VDPA_ATTR_DEV_QUEUE_INDEX,              /* u16 */
+	VDPA_ATTR_DEV_QUEUE_TYPE,               /* string */
+	VDPA_ATTR_DEV_VENDOR_ATTR_NAME,		/*
string */
+	VDPA_ATTR_DEV_VENDOR_ATTR_VALUE,        /* u64 */
+
       	/* new attributes must be added above here */
       	VDPA_ATTR_MAX,
       };

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux