On 06/27/2012 09:03 AM, Osier Yang wrote: > > On 2012年06月27日 04:02, Laine Stump wrote: >> >> (NB: I'm Cc'ing Osier on this email, as he's quite knowledgeable about >> >> the PCI passthrough device allocation tracking code. You should probably >> >> move this discussion to the mailing list sooner rather than later >> >> though, as a public discussion of the design will give you a better >> >> chance of your first revision getting successfully past review :-)) >> >> >> >> On 06/26/2012 07:23 AM, Shradha Shah wrote: >>> >>> Laine, >>> >>> >>> >>> I have submitted my v2 patches for forward mode='hostdev' and am >>> >>> planning to work on the in-use tracker for network >>> >>> and pci-passthrough devices. >>> >>> >>> >>> I am unable to wrap my head around how I should be implementing this >>> >>> functionality. I am unable to decide at what >>> >>> level I should be implementing this (network, domain or qemu). >>> >>> >>> >>> May I ask for your guidance in order to implement this functionality? >>> >>> >> >> >> >> Yes, but I'm currently on vacation (in Turkey) so I won't have much time >> >> to respond until July 9 when I return. >> >> >> >> In the meantime, I think the right way to do this is by integrating with >> >> the code in the qemu driver that keeps track of which PCI devices are in >> >> use. This already happens at the very basic level of "if the device >> >> allocated by the network driver is in use, the attempt to assign the >> >> device will fail"; instead, the network driver should be able to ask >> >> qemu if the device it wants to allocate to the guest is already in use >> >> (and reserve it, in one atomic operation). > > > > Hi, Shradha, Laine, > > > > I have not read your patches for "forward=hostdev" carefully, so > > not sure if I can give right direction, but let me try: > > > > It looks like what you will do is just reserve the vf or pf from host, > > and when the vf/pf is attached to domain or used in other ways, you > > want it to be marked as in-use, am I correct? Correct. Currently the network driver picks a device from its pool and returns it to qemu having no idea if maybe that device is already used in some other way. By the time we get back to qemu and learn that the device is already used, the best we can do is fail, which is "less than ideal" :-) > > > > If so, it should be not hard to do, for each PCI device, we have a > > field named "used_by", to stores the domain name which uses it, and in > > qemu driver, we have two list "activePciHostdevs", "inactivePciHostdevs" > > of pciDeviceList type. > > > > "activePciHostdevs" holds the PCI devices which are in used by all > > the qemu domains, and "inactivePciHostdevs" holds the PCI devices > > detached from the host, and not used by any domain. Basicly the purpose > > of "inactivePciHostdevs" is to resolve the problem of pci device > > resetting on two PCI devices share the same bus. See commit 6be610bf > > for more details. > > > > So that means, updating the "used_by" field of the pci device, > > "activePciHostdevs", and "inactivePciHostdevs" all happens > > while attaching the interface to domain, or detaching it from the > > domain, or when domain starting, or when the domain is shutdown. > > > > E.g, attaching the interface to domain (assuming the attachment > > succeeded), it needs to do: > > > > 1) Set "used_by" as the domain name > > 2) Insert the device to "activePciHostdevs" list. > > 3) Remove the device from "inactivePciHostdevs" list if it was > > there. The trick is to do enough of that in networkAllocateActualDevice to assure that 1) the device won't be used by someone else, 2) the guest that's grabbing the device *can* use it, and 3) "the right thing" will happen if libvirtd is restarted sometime after the device is "reserved" but before the guest is started. > > > > > > Porcess of detaching is just opposite with above. However, the > > whole process is much more complicated than the 3 listed steps. > > > > I found you introduce new members for virNetworkForwardIfDef: > > > > struct _virNetworkForwardIfDef { > > - char *dev; /* name of device */ > > + int type; > > + union { > > + virDevicePCIAddress pci; /*PCI Address of device */ > > + /* when USB devices are supported a new variable to be added > > here */ > > + char *dev; /* name of device */ > > + }device; > > + int usageCount; /* how many guest interfaces are bound to this > > device? */ > > +}; > > > > So why don't use pciDevice. e.g. In general I think it would be a good idea to unify pciDevice, virDevicePCIAddress, and pci_config_address as much as possible, but pciDevice itself has a lot of fields that don't make sense in a configuration object, and anyway currently all the other conf code (including hostdev definitions) uses virDevicePCIAddress, and there is already code to parse/format to/from a virDevicePCIAddress. As a matter of fact, pciDevice is defined in pci.c, so it can't be used anywhere else, and the API presented by pci.h uses individual components (domain, bus, slot, function) when it needs to describe a PCI device. So for now at least, I think virNetworkForwardIfDef should use virDevicePCIAddress, like the other *_conf code; when the network driver needs to call the APIs defined in pci.h, it will just give the individual fields as separate arguments anyway. > > > > struct _virNetworkForwardIfDef { > > char *dev; /* name of device */ > > int type; > > union { > > pciDevice pci; /*PCI Address of device */ > > /* when USB devices are supported a new variable to be added > > here */ > > char *dev; /* name of device */ > > } device; > > int usageCount; /* how many guest interfaces are bound to this > > device? */ > > }; > > > > You can add usbDevice there once it's supported. That means > > you can reuse the existed codes for pci and devices management > > of qemu driver. Here again, I don't think usbDevice would be the proper item to have in the config object, since it has extra fields that are unrelated to the config. All that will really be needed is the "usb" struct that's defined inside virDomainHostdevSubsys. Of course, I'm not sure it will ever make sense to assign USB network devices to guests from a pool anyway, because 1) the ones I've seen don't have performance worth even mentioning, and 2) it's not possible (at least in my tests) to modify the MAC address of a USB network device prior to assigning it to a guest, so the guest would not be guaranteed a fixed MAC address. This makes it a non-starter for every operating system I know about. (This is why I didn't support USB in the code that introduces "<interface type='hostdev'>") > > >> >> >> >> Of course, once the network driver has reserved the device from qemu's >> >> PCI passthrough code, it would return that device to the qemu driver >> >> code that wants to attach the interface, and it would fail because it >> >> would be told the device is already in use (well, yeah! *We* just marked >> >> it as in-use!). To make that work, I guess some sort of >> >> cookie/handle/pointer would need to be passed from qemu's pci >> >> passthrough code back to the network driver, and the network driver >> >> would return it back to qemu's network interface attach code, which >> >> would then use that special cookie/handle/pointer to attach the device >> >> (saying "yeah, I know it's already in use, and here's my pass-card"). >> >> >> >> (Talking about this makes me think that the code that keeps track of PCI >> >> device allocation shouldn't really be a part of qemu, but should be a >> >> separate module, so that the network driver can still function properly >> >> even if the qemu driver isn't loaded.) > > > > Agreed. That should resolve the problem of data sharing between > > network and hypervisor drivers. > > >> >> >> >> Another twist to this that should be considered - if any particular >> >> device is in use by at least one guest for one of the macvtap modes, >> >> that device also needs to be marked as in-use in libvirt's pci device >> >> table - it would be disastrous if another guest decided to use that >> >> device for standard PCI Passthrough. >> >> >> >> (Keep in mind that I wrote everything above without even once looking at >> >> the code or any other reference, so you should take it with a grain of >> >> salt!) >> >> > > > > Many Thanks, Regards, Shradha Shah On 06/28/2012 11:48 AM, Shradha Shah wrote: > This is a reply from Osier Yang > ================================================================================================================ > On 2012年06月27日 04:02, Laine Stump wrote: >> (NB: I'm Cc'ing Osier on this email, as he's quite knowledgeable about >> the PCI passthrough device allocation tracking code. You should probably >> move this discussion to the mailing list sooner rather than later >> though, as a public discussion of the design will give you a better >> chance of your first revision getting successfully past review :-)) >> >> On 06/26/2012 07:23 AM, Shradha Shah wrote: >>> Laine, >>> >>> I have submitted my v2 patches for forward mode='hostdev' and am planning to work on the in-use tracker for network >>> and pci-passthrough devices. >>> >>> I am unable to wrap my head around how I should be implementing this functionality. I am unable to decide at what >>> level I should be implementing this (network, domain or qemu). >>> >>> May I ask for your guidance in order to implement this functionality? >>> >> >> Yes, but I'm currently on vacation (in Turkey) so I won't have much time >> to respond until July 9 when I return. >> >> In the meantime, I think the right way to do this is by integrating with >> the code in the qemu driver that keeps track of which PCI devices are in >> use. This already happens at the very basic level of "if the device >> allocated by the network driver is in use, the attempt to assign the >> device will fail"; instead, the network driver should be able to ask >> qemu if the device it wants to allocate to the guest is already in use >> (and reserve it, in one atomic operation). > > Hi, Shradha, Laine, > > I have not read your patches for "forward=hostdev" carefully, so > not sure if I can give right direction, but let me try: > > It looks like what you will do is just reserve the vf or pf from host, > and when the vf/pf is attached to domain or used in other ways, you > want it to be marked as in-use, am I correct? > > If so, it should be not hard to do, for each PCI device, we have a > field named "used_by", to stores the domain name which uses it, and in > qemu driver, we have two list "activePciHostdevs", "inactivePciHostdevs" > of pciDeviceList type. > > "activePciHostdevs" holds the PCI devices which are in used by all > the qemu domains, and "inactivePciHostdevs" holds the PCI devices > detached from the host, and not used by any domain. Basicly the purpose > of "inactivePciHostdevs" is to resolve the problem of pci device > resetting on two PCI devices share the same bus. See commit 6be610bf > for more details. > > So that means, updating the "used_by" field of the pci device, > "activePciHostdevs", and "inactivePciHostdevs" all happens > while attaching the interface to domain, or detaching it from the > domain, or when domain starting, or when the domain is shutdown. > > E.g, attaching the interface to domain (assuming the attachment > succeeded), it needs to do: > > 1) Set "used_by" as the domain name > 2) Insert the device to "activePciHostdevs" list. > 3) Remove the device from "inactivePciHostdevs" list if it was > there. > > Porcess of detaching is just opposite with above. However, the > whole process is much more complicated than the 3 listed steps. > > I found you introduce new members for virNetworkForwardIfDef: > > struct _virNetworkForwardIfDef { > - char *dev; /* name of device */ > + int type; > + union { > + virDevicePCIAddress pci; /*PCI Address of device */ > + /* when USB devices are supported a new variable to be added here */ > + char *dev; /* name of device */ > + }device; > + int usageCount; /* how many guest interfaces are bound to this device? */ > +}; > > So why don't use pciDevice. e.g. > > struct _virNetworkForwardIfDef { > char *dev; /* name of device */ > int type; > union { > pciDevice pci; /*PCI Address of device */ > /* when USB devices are supported a new variable to be added here */ > char *dev; /* name of device */ > } device; > int usageCount; /* how many guest interfaces are bound to this device? */ > }; > > You can add usbDevice there once it's supported. That means > you can reuse the existed codes for pci and devices management > of qemu driver. > >> >> Of course, once the network driver has reserved the device from qemu's >> PCI passthrough code, it would return that device to the qemu driver >> code that wants to attach the interface, and it would fail because it >> would be told the device is already in use (well, yeah! *We* just marked >> it as in-use!). To make that work, I guess some sort of >> cookie/handle/pointer would need to be passed from qemu's pci >> passthrough code back to the network driver, and the network driver >> would return it back to qemu's network interface attach code, which >> would then use that special cookie/handle/pointer to attach the device >> (saying "yeah, I know it's already in use, and here's my pass-card"). >> >> (Talking about this makes me think that the code that keeps track of PCI >> device allocation shouldn't really be a part of qemu, but should be a >> separate module, so that the network driver can still function properly >> even if the qemu driver isn't loaded.) > > Agreed. That should resolve the problem of data sharing between > network and hypervisor drivers. > >> >> Another twist to this that should be considered - if any particular >> device is in use by at least one guest for one of the macvtap modes, >> that device also needs to be marked as in-use in libvirt's pci device >> table - it would be disastrous if another guest decided to use that >> device for standard PCI Passthrough. >> >> (Keep in mind that I wrote everything above without even once looking at >> the code or any other reference, so you should take it with a grain of >> salt!) >> > > > Many Thanks, > Regards, > Shradha Shah > > On 06/28/2012 11:33 AM, Shradha Shah wrote: >> This is a reply I got from Laine Stump >> ===================================================================================================================== >> >> (NB: I'm Cc'ing Osier on this email, as he's quite knowledgeable about >> the PCI passthrough device allocation tracking code. You should probably >> move this discussion to the mailing list sooner rather than later >> though, as a public discussion of the design will give you a better >> chance of your first revision getting successfully past review :-)) >> >> On 06/26/2012 07:23 AM, Shradha Shah wrote: >>>> Laine, >>>> >>>> I have submitted my v2 patches for forward mode='hostdev' and am planning to work on the in-use tracker for network >>>> and pci-passthrough devices. >>>> >>>> I am unable to wrap my head around how I should be implementing this functionality. I am unable to decide at what >>>> level I should be implementing this (network, domain or qemu). >>>> >>>> May I ask for your guidance in order to implement this functionality? >>>> >> Yes, but I'm currently on vacation (in Turkey) so I won't have much time >> to respond until July 9 when I return. >> >> In the meantime, I think the right way to do this is by integrating with >> the code in the qemu driver that keeps track of which PCI devices are in >> use. This already happens at the very basic level of "if the device >> allocated by the network driver is in use, the attempt to assign the >> device will fail"; instead, the network driver should be able to ask >> qemu if the device it wants to allocate to the guest is already in use >> (and reserve it, in one atomic operation). >> >> Of course, once the network driver has reserved the device from qemu's >> PCI passthrough code, it would return that device to the qemu driver >> code that wants to attach the interface, and it would fail because it >> would be told the device is already in use (well, yeah! *We* just marked >> it as in-use!). To make that work, I guess some sort of >> cookie/handle/pointer would need to be passed from qemu's pci >> passthrough code back to the network driver, and the network driver >> would return it back to qemu's network interface attach code, which >> would then use that special cookie/handle/pointer to attach the device >> (saying "yeah, I know it's already in use, and here's my pass-card"). >> >> (Talking about this makes me think that the code that keeps track of PCI >> device allocation shouldn't really be a part of qemu, but should be a >> separate module, so that the network driver can still function properly >> even if the qemu driver isn't loaded.) >> >> Another twist to this that should be considered - if any particular >> device is in use by at least one guest for one of the macvtap modes, >> that device also needs to be marked as in-use in libvirt's pci device >> table - it would be disastrous if another guest decided to use that >> device for standard PCI Passthrough. >> >> (Keep in mind that I wrote everything above without even once looking at >> the code or any other reference, so you should take it with a grain of >> salt!) >> >> >> >> Many Thanks, >> Regards, >> Shradha Shah >> >> On 06/28/2012 11:19 AM, Shradha Shah wrote: >>> This is a conversation that I started with Laine Stump for the implementation of the in-use tracker for network and pci devices. >>> >>> I want to make this conversation more public in order to receive everyone's view on the topic. >>> >>> I will also post the responses I got from Laine and Osier Yang. >>> >>> Many Thanks, >>> Regards, >>> Shradha Shah >>> >>> >>> -------- Original Message -------- >>> Subject: In Use tracker for network and pci-passthrough devices >>> Date: Tue, 26 Jun 2012 12:23:52 +0100 >>> From: Shradha Shah <sshah@xxxxxxxxxxxxxx> >>> To: Laine Stump <laine@xxxxxxxxx> >>> >>> Laine, >>> >>> I have submitted my v2 patches for forward mode='hostdev' and am planning to work on the in-use tracker for network >>> and pci-passthrough devices. >>> >>> I am unable to wrap my head around how I should be implementing this functionality. I am unable to decide at what >>> level I should be implementing this (network, domain or qemu). >>> >>> May I ask for your guidance in order to implement this functionality? >>> >> >> -- >> libvir-list mailing list >> libvir-list@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/libvir-list > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list