Re: [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev

"Zhu, Lingshan" <lingshan.zhu@xxxxxxxxx> · Wed, 17 Aug 2022 10:14:26 +0800

On 8/17/2022 7:14 AM, Si-Wei Liu wrote:


On 8/16/2022 2:08 AM, Zhu, Lingshan wrote:


On 8/16/2022 3:58 PM, Si-Wei Liu wrote:


On 8/15/2022 6:58 PM, Zhu, Lingshan wrote:


On 8/16/2022 7:32 AM, Si-Wei Liu wrote:


On 8/15/2022 2:26 AM, Zhu Lingshan wrote:
Some fields of virtio-net device config space are
conditional on the feature bits, the spec says:

"The mac address field always exists
(though is only valid if VIRTIO_NET_F_MAC is set)"

"max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ
or VIRTIO_NET_F_RSS is set"

"mtu only exists if VIRTIO_NET_F_MTU is set"

so we should read MTU, MAC and MQ in the device config
space only when these feature bits are offered.

For MQ, if both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS are
not set, the virtio device should have
one queue pair as default value, so when userspace querying queue 
pair numbers,
it should return mq=1 than zero.

For MTU, if VIRTIO_NET_F_MTU is not set, we should not read
MTU from the device config sapce.
RFC894 <A Standard for the Transmission of IP Datagrams over 
Ethernet Networks>
says:"The minimum length of the data field of a packet sent over an
Ethernet is 1500 octets, thus the maximum length of an IP datagram
sent over an Ethernet is 1500 octets.  Implementations are 
encouraged
to support full-length packets"
Noted there's a typo in the above "The *maximum* length of the 
data field of a packet sent over an Ethernet is 1500 octets ..." 
and the RFC was written 1984.
the spec RFC894 says it is 1500, see <a 
href="https://urldefense.com/v3/__https://www.rfc-editor.org/rfc/rfc894.txt__;!!ACWV5N9M2RV99hQ!MdgxZjw5sp5Qz-GKfwT1IWcw_L4Jo1-UekuJPFz1UrG3YuqirKz7P9ksdJFh1vB6zHJ7z8Q04fpT0-9jWXCtlWM$";>https://urldefense.com/v3/__https://www.rfc-editor.org/rfc/rfc894.txt__;!!ACWV5N9M2RV99hQ!KVwfun0b1Q59Ajp6O7JrB-BuEBSLyQ9e95oGq1cVG_sQIPDL0whI5frx1EGoQFznmm67RsEeJTrUdfYrmZPRFaM$ 
</a>

Apparently that is no longer true with the introduction of Jumbo 
size frame later in the 2000s. I'm not sure what is the point of 
mention this ancient RFC. It doesn't say default MTU of any 
Ethernet NIC/switch should be 1500 in either  case.
This could be a larger number for sure, we are trying to find out 
the min value for Ethernet here, to support 1500 octets, MTU should 
be 1500 at least, so I assume 1500 should be the default value for MTU


virtio spec says:"The virtio network device is a virtual ethernet 
card",
Right,
so the default MTU value should be 1500 for virtio-net.
... but it doesn't say the default is 1500. At least, not in 
explicit way. Why it can't be 1492 or even lower? In practice, if 
the network backend has a MTU higher than 1500, there's nothing 
wrong for guest to configure default MTU more than 1500.
same as above


For MAC, the spec says:"If the VIRTIO_NET_F_MAC feature bit is set,
the configuration space mac entry indicates the “physical” address
of the network card, otherwise the driver would typically
generate a random local MAC address." So there is no
default MAC address if VIRTIO_NET_F_MAC not set.

This commits introduces functions vdpa_dev_net_mtu_config_fill()
and vdpa_dev_net_mac_config_fill() to fill MTU and MAC.
It also fixes vdpa_dev_net_mq_config_fill() to report correct
MQ when _F_MQ is not present.

These functions should check devices features than driver
features, and struct vdpa_device is not needed as a parameter

The test & userspace tool output:

Feature bit VIRTIO_NET_F_MTU, VIRTIO_NET_F_RSS, VIRTIO_NET_F_MQ
and VIRTIO_NET_F_MAC can be mask out by hardcode.

However, it is challenging to "disable" the related fields
in the HW device config space, so let's just assume the values
are meaningless if the feature bits are not set.

Before this change, when feature bits for RSS, MQ, MTU and MAC
are not set, iproute2 output:
$vdpa vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false 
mtu 1500
   negotiated_features

without this commit, function vdpa_dev_net_config_fill()
reads all config space fields unconditionally, so let's
assume the MAC and MTU are meaningless, and it checks
MQ with driver_features, so we don't see max_vq_pairs.

After applying this commit, when feature bits for
MQ, RSS, MAC and MTU are not set,iproute2 output:
$vdpa dev config show vdpa0
vdpa0: link up link_announce false max_vq_pairs 1 mtu 1500
   negotiated_features

As explained above:
Here is no MAC, because VIRTIO_NET_F_MAC is not set,
and there is no default value for MAC. It shows
max_vq_paris = 1 because even without MQ feature,
a functional virtio-net must have one queue pair.
mtu = 1500 is the default value as ethernet
required.

This commit also add supplementary comments for
__virtio16_to_cpu(true, xxx) operations in
vdpa_dev_net_config_fill() and vdpa_fill_stats_rec()

Signed-off-by: Zhu Lingshan <lingshan.zhu@xxxxxxxxx>
---
  drivers/vdpa/vdpa.c | 60 
+++++++++++++++++++++++++++++++++++----------
  1 file changed, 47 insertions(+), 13 deletions(-)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index efb55a06e961..a74660b98979 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -801,19 +801,44 @@ static int 
vdpa_nl_cmd_dev_get_dumpit(struct sk_buff *msg, struct 
netlink_callba
      return msg->len;
  }
  -static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,
-                       struct sk_buff *msg, u64 features,
+static int vdpa_dev_net_mq_config_fill(struct sk_buff *msg, u64 
features,
                         const struct virtio_net_config *config)
  {
      u16 val_u16;
  -    if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0)
-        return 0;
+    if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0 &&
+        (features & BIT_ULL(VIRTIO_NET_F_RSS)) == 0)
+        val_u16 = 1;
+    else
+        val_u16 = __virtio16_to_cpu(true, 
config->max_virtqueue_pairs);
  -    val_u16 = le16_to_cpu(config->max_virtqueue_pairs);
      return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, 
val_u16);
  }
  +static int vdpa_dev_net_mtu_config_fill(struct sk_buff *msg, 
u64 features,
+                    const struct virtio_net_config *config)
+{
+    u16 val_u16;
+
+    if ((features & BIT_ULL(VIRTIO_NET_F_MTU)) == 0)
+        val_u16 = 1500;
As said, there's no virtio spec defined value for MTU. Please 
leave this field out if feature VIRTIO_NET_F_MTU is not negotiated.
same as above
+    else
+        val_u16 = __virtio16_to_cpu(true, config->mtu);
+
+    return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16);
+}
+
+static int vdpa_dev_net_mac_config_fill(struct sk_buff *msg, u64 
features,
+                    const struct virtio_net_config *config)
+{
+    if ((features & BIT_ULL(VIRTIO_NET_F_MAC)) == 0)
+        return 0;
+    else
+        return  nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
+                sizeof(config->mac), config->mac);
+}
+
+
  static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, 
struct sk_buff *msg)
  {
      struct virtio_net_config config = {};
@@ -822,18 +847,16 @@ static int vdpa_dev_net_config_fill(struct 
vdpa_device *vdev, struct sk_buff *ms
        vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
  -    if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR, 
sizeof(config.mac),
-            config.mac))
-        return -EMSGSIZE;
+    /*
+     * Assume little endian for now, userspace can tweak this for
+     * legacy guest support.
You can leave it as a TODO for kernel (vdpa core limitation), but 
AFAIK there's nothing userspace needs to do to infer the 
endianness. IMHO it's the kernel's job to provide an abstraction 
rather than rely on userspace guessing it.
we have discussed it in another thread, and this comment is 
suggested by MST.
Can you provide the context or link? It shouldn't work like this, 
otherwise it is breaking uABI. E.g. how will a legacy/BE supporting 
kernel/device be backward compatible with older vdpa tool (which has 
knowledge of this endianness implication/assumption from day one)?
https://urldefense.com/v3/__https://www.spinics.net/lists/netdev/msg837114.html__;!!ACWV5N9M2RV99hQ!KVwfun0b1Q59Ajp6O7JrB-BuEBSLyQ9e95oGq1cVG_sQIPDL0whI5frx1EGoQFznmm67RsEeJTrUdfYrGq7Vwjk$ 

The challenge is that the status filed is virtio16, not le16, so 
le16_to_cpu(xxx) is wrong anyway. However we can not tell whether it 
is a LE or BE device from struct vdpa_device, so for most cases, we 
assume it is LE, and leave this comment.
While the fix is fine, the comment is misleading in giving readers 
false hope. This is in vdpa_dev_net_config_fill() the vdpa tool query 
path, instead of calls from the VMM dealing with vhost/virtio plumbing 
specifics. I think what's missing today in vdpa core is the detection 
of guest type (legacy, transitional, or modern) regarding endianness 
through F_VERSION_1 and legacy interface access, the latter of which 
would need some assistance from VMM for sure. However, the presence of 
information via the vdpa tool query is totally orthogonal. I don't get 
a good reason for why it has to couple with endianness. How vdpa tool 
users space is supposed to tweak it? I don't get it...
Yes it is a little messy, and we can not check _F_VERSION_1 because of 
transitional devices, so maybe this is the best we can do for now

-Siwei



Thanks

-Siwei


+     */
+    val_u16 = __virtio16_to_cpu(true, config.status);
        val_u16 = __virtio16_to_cpu(true, config.status);
      if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16))
          return -EMSGSIZE;
  -    val_u16 = __virtio16_to_cpu(true, config.mtu);
-    if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
-        return -EMSGSIZE;
-
      features_driver = vdev->config->get_driver_features(vdev);
      if (nla_put_u64_64bit(msg, 
VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features_driver,
                    VDPA_ATTR_PAD))
@@ -846,7 +869,13 @@ static int vdpa_dev_net_config_fill(struct 
vdpa_device *vdev, struct sk_buff *ms
                    VDPA_ATTR_PAD))
          return -EMSGSIZE;
  -    return vdpa_dev_net_mq_config_fill(vdev, msg, 
features_driver, &config);
+    if (vdpa_dev_net_mac_config_fill(msg, features_device, 
&config))
+        return -EMSGSIZE;
+
+    if (vdpa_dev_net_mtu_config_fill(msg, features_device, 
&config))
+        return -EMSGSIZE;
+
+    return vdpa_dev_net_mq_config_fill(msg, features_device, 
&config);
  }
    static int
@@ -914,6 +943,11 @@ static int vdpa_fill_stats_rec(struct 
vdpa_device *vdev, struct sk_buff *msg,
      }
      vdpa_get_config_unlocked(vdev, 0, &config, sizeof(config));
  +    /*
+     * Assume little endian for now, userspace can tweak this for
+     * legacy guest support.
+     */
+
Ditto.
same as above

Thanks

Thanks,
-Siwei
max_vqp = __virtio16_to_cpu(true, config.max_virtqueue_pairs);
      if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, max_vqp))
          return -EMSGSIZE;