Re: [PATCH 3/4] virsh-domain: update attach-interface to support type=hostdev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
Adding this feature will allow users to easily attach a hostdev network
interface using PCI passthrough.

The interface can be attached using --type=hostdev and PCI address or
network device name as --source.  This command also allows you to tell,
whether the interface should be managed and to choose a assignment
driver.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=997561

I think the PCI address version of this command is fairly straightforward, so it can and should go in. But I've been thinking more about the "netdev name" version since our exchange in the BZ, and even had a very long treatise prepared defending my position from there, but then I decided to look at it all again from the beginning and came to the conclusion that we are *both* wrong :-)

The main aim here is convenience, and while I still have the position that if we're going to make it more convenient, we should make it more convenient at the XML level so that all consumers of libvirt can take advantage without needing a ton of extra code, I also have realized that specifying the netdev name is not very convenient anyway.

Why?

* Keep in mind that <interface type='hostdev'> will only work with SRIOV VFs (because you can't set the MAC address of a normal netdev from the host and have that MAC address persist through the guest driver's device init).

* So any device that you assign in this way is a VF.

* A user will know which PF ("Physical Function", but from their point of view it's "the physical port used for the connection") they want a VF from; they don't care which VF (they're all functionally equivalent), and anyway the standard utilities don't even tell you the netdev names of the VFs associated with a particular PF. So it's not that simple to learn the netdev name of the VF you want to use:

  * virsh nodedev-list --tree doesn't group the VFs under their PF
    (because its hierarchy is according to the PCI bus layout, which has
    all the PFs and VFs at the same level)

  * virsh nodedev-dumpxml for the PF device shows the VFs'
    PCI addresses, NOT their netdev names.

   <capability type='virt_functions'>
      <address domain='0x0000' bus='0x02' slot='0x10' function='0x0'/>
      <address domain='0x0000' bus='0x02' slot='0x10' function='0x2'/>
      <address domain='0x0000' bus='0x02' slot='0x10' function='0x4'/>
      <address domain='0x0000' bus='0x02' slot='0x10' function='0x6'/>
      <address domain='0x0000' bus='0x02' slot='0x11' function='0x0'/>
      <address domain='0x0000' bus='0x02' slot='0x11' function='0x2'/>
      <address domain='0x0000' bus='0x02' slot='0x11' function='0x4'/>

(Note the leading "0x" on these values :-P)

  * "ip link show" lists the VFs directly under their PF,
    but shows the VF#, not the netdev name

    11: p4p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
              state UP mode DEFAULT group default qlen 1000
              link/ether 90:e2:ba:02:22:01 brd ff:ff:ff:ff:ff:ff
       vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
       vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
       vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
       vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
       vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
       vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
       vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto

So if there's not a simple way to get a list of the netdev names of the VFs for a PF, then what convenience is gained by allowing use of netdev name? What exactly was I thinking when I wrote comment 1 in that BZ???

What *could* be useful would be the ability to specify the PF name and VF# of the device you want to assign, e.g.: "<source pf='p4p2' vf='3'/>", since that's info you can easily get from "ip link show". Anymore, though, I don't even know how useful *that* is, since you can already get the PCI addresses of the VFs from the output of virsh nodedev-dumpxml (in almost exactly the format you need - just add "type='pci'".

(I'm now even wondering if I misunderstood what the original reporter was asking for, but unfortunately it's not possible to ask him, because his account has been closed :-( Thinking back to what he said - there is a *different* place where it's common to know the netdev name and want to translate it into a PCI address - when you are assigning a *non-SRIOV* ethernet device using plain <hostdev> (not <interface type='hostdev'>, which only works for SRIOV VFs). In this case you probably know the netdev name but not the PCI address. So for this case a translation from netdev name to PCI address would make sense, and here I would agree that the translation should happen in virsh rather than the generic <hostdev> XML understanding what a netdev name is (contrary to <interface type='hostdev'>, where it is well established that the device you're assigning is a network device, and there are already "netdev-y" config attributes, up until now <hostdev> has not contained any config items specific to a particular type of PCI device, and I don't think it should have any added)).

TL;DR of my opinion:

1) this patch minus the netdev-->PCI address translation is good

2) we don't need the netdev-->PCI translation for <interface type='hostdev'>;
   it's just extra code for (in my newly revised opinion) no gain.

3) a netdev-->PCI translation in virsh that creates a plain
   <hostdev> might be useful (hmm, maybe keep (2) as you have it,
   with an additional check for

4) adding <source pf='ethX' vf='n'/> support to <interface type='hostdev'>
   might be useful.

How much sense does any of that make?



Signed-off-by: Pavel Hrdina <phrdina@xxxxxxxxxx>
---
  tools/virsh-domain.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++---
  1 file changed, 86 insertions(+), 4 deletions(-)

diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
index e8503ec..b124441 100644
--- a/tools/virsh-domain.c
+++ b/tools/virsh-domain.c
@@ -56,6 +56,7 @@
  #include "virtime.h"
  #include "virtypedparam.h"
  #include "virxml.h"
+#include "virsh-nodedev.h"
/* Gnulib doesn't guarantee SA_SIGINFO support. */
  #ifndef SA_SIGINFO
@@ -866,6 +867,14 @@ static const vshCmdOptDef opts_attach_interface[] = {
       .type = VSH_OT_BOOL,
       .help = N_("print XML document rather than attach the interface")
      },
+    {.name = "managed",
+     .type = VSH_OT_BOOL,
+     .help = N_("set the interface to be managed by libvirt")

Maybe a more descriptive help - "have libvirt automatically manage detach/attach of device from host driver" (or something like that; I know that's too long).

+    },
+    {.name = "driver",
+     .type = VSH_OT_STRING,
+     .help = N_("set driver for hostdev interface, default is 'kvm'")

Actually the default behavior depends on what is available on the host - if only one of legacy-kvm and vfio is available, then that is the one used, but if both are available, then vfio is used (so vfio is closer to being "default" than kvm). The kvm method of assigning devices is deprecated, and it has been completely disabled in some distros (definitely RHEL7 and CentOS7, not sure about Fedora). These days manually specifying which driver to use is mostly pointless; I vaguely recall that it was initially added because when VFIO was first added some were nervous about not having a simple way to fallback to old behavior if something went wrong while using VFIO; those days are long gone though, and I can't think of a situation where anyone would want/need to specify which driver to use (i.e. I'm not sure we even need to expose this option in virsh; it may confuse more than help).

+    },
      {.name = NULL}
  };
@@ -919,7 +928,7 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
      virDomainPtr dom = NULL;
      const char *mac = NULL, *target = NULL, *script = NULL,
                 *type = NULL, *source = NULL, *model = NULL,
-               *inboundStr = NULL, *outboundStr = NULL;
+               *inboundStr = NULL, *outboundStr = NULL, *driver = NULL;
      virNetDevBandwidthRate inbound, outbound;
      virDomainNetType typ;
      int ret;
@@ -931,6 +940,7 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
      bool config = vshCommandOptBool(cmd, "config");
      bool live = vshCommandOptBool(cmd, "live");
      bool persistent = vshCommandOptBool(cmd, "persistent");
+    bool managed = vshCommandOptBool(cmd, "managed");
VSH_EXCLUSIVE_OPTIONS_VAR(persistent, current); @@ -949,7 +959,8 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
          vshCommandOptStringReq(ctl, cmd, "script", &script) < 0 ||
          vshCommandOptStringReq(ctl, cmd, "model", &model) < 0 ||
          vshCommandOptStringReq(ctl, cmd, "inbound", &inboundStr) < 0 ||
-        vshCommandOptStringReq(ctl, cmd, "outbound", &outboundStr) < 0)
+        vshCommandOptStringReq(ctl, cmd, "outbound", &outboundStr) < 0 ||
+        vshCommandOptStringReq(ctl, cmd, "driver", &driver) < 0)
          goto cleanup;
/* check interface type */
@@ -982,8 +993,23 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
          }
      }
+ if (typ != VIR_DOMAIN_NET_TYPE_HOSTDEV) {
+        if (managed) {
+            vshError(ctl, _("--managed is usable only with --type=hostdev"));
+            goto cleanup;
+        }
+        if (driver) {
+            vshError(ctl, _("--driver is usable only with --type=hostdev"));
+            goto cleanup;
+        }
+    }
+
      /* Make XML of interface */
-    virBufferAsprintf(&buf, "<interface type='%s'>\n", type);
+    virBufferAsprintf(&buf, "<interface type='%s'", type);
+    if (managed)
+        virBufferAddLit(&buf, " managed='yes'>\n");
+    else
+        virBufferAddLit(&buf, ">\n");
      virBufferAdjustIndent(&buf, 2);
switch (typ) {
@@ -995,6 +1021,63 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
      case VIR_DOMAIN_NET_TYPE_DIRECT:
          virBufferAsprintf(&buf, "<source dev='%s'/>\n", source);
          break;
+    case VIR_DOMAIN_NET_TYPE_HOSTDEV:
+    {
+        struct PCIAddress pciAddr = {0, 0, 0, 0};
+
+        if (str2PCIAddress(source, &pciAddr) < 0) {
+            const char *caps[] = {"net"};
+            char *tmpName = NULL;
+            virshNodeDeviceListPtr list = NULL;
+            virNodeDevicePtr netDev = NULL;
+            size_t i;
+
+            list = virshNodeDeviceListCollect(ctl, (char **)caps, 1,
+                                              VIR_CONNECT_LIST_NODE_DEVICES_CAP_NET);
+
+            if (!list) {
+                vshError(ctl, _("cannot list network devices"));
+                goto cleanup;
+            }
+
+            if (virAsprintf(&tmpName, "net_%s", source) < 0)
+                goto cleanup;
+
+            for (i = 0; i < list->ndevices; i++) {
+                if (STREQLEN(tmpName, virNodeDeviceGetName(list->devices[i]),
+                             strlen(tmpName)))
+                    netDev = list->devices[i];
+            }
+            VIR_FREE(tmpName);
+
+            if (!netDev) {
+                vshError(ctl, _("network interface '%s' doesn't exist"),
+                         source);
+                goto cleanup;
+            }
+
+            if (str2PCIAddress(virNodeDeviceGetParent(netDev)+4, &pciAddr) < 0) {
+                virshNodeDeviceListFree(list);
+                vshError(ctl, _("cannot parse pci address for network "
+                                "interface '%s'"), source);
+                goto cleanup;
+            }
+
+            virshNodeDeviceListFree(list);
+        }
+        if (driver)
+            virBufferAsprintf(&buf, "<driver name='%s'/>\n", driver);
+
+        virBufferAddLit(&buf, "<source>\n");
+        virBufferAdjustIndent(&buf, 2);
+        virBufferAsprintf(&buf, "<address type='pci' domain='0x%.4x'"
+                          " bus='0x%.2x' slot='0x%.2x' function='0x%.1x'/>\n",
+                          pciAddr.domain, pciAddr.bus,
+                          pciAddr.slot, pciAddr.function);
+        virBufferAdjustIndent(&buf, -2);
+        virBufferAddLit(&buf, "</source>\n");
+        break;
+    }
case VIR_DOMAIN_NET_TYPE_USER:
      case VIR_DOMAIN_NET_TYPE_ETHERNET:
@@ -1004,7 +1087,6 @@ cmdAttachInterface(vshControl *ctl, const vshCmd *cmd)
      case VIR_DOMAIN_NET_TYPE_MCAST:
      case VIR_DOMAIN_NET_TYPE_UDP:
      case VIR_DOMAIN_NET_TYPE_INTERNAL:
-    case VIR_DOMAIN_NET_TYPE_HOSTDEV:
      case VIR_DOMAIN_NET_TYPE_LAST:
          vshError(ctl, _("No support for %s in command 'attach-interface'"),
                   type);

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list



[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]