On Mon, Nov 19, 2012 at 05:30:11PM +0800, Osier Yang wrote: > Hi, > > This proposal is trying to figure out a solution for migration > of domain which uses LUN behind vHBA as disk device (QEMU > emulated disk only at this stage). And other related NPIV > improvements which are not related with migration. I'm not > luck to get a environment to test if the thoughts are workable, > but I'd like see if guys have good idea/suggestions earlier. > > 1) Persistent vHBA support > > This is the useful stuff missed for long time. Assuming > that one created a vHBA, did masking/zoning, everything works > as expected. However, after a system rebooting, everything is > just lost. If the user wants to get things back, he has to > find out the preivous WWNN & WWPN, and create the vHBA again. > > On the other hand, Persistent vHBA support is actually required > for domain which uses LUN behind a vHBA. Othewise the domain > could fail to start after a system rebooting. > > To support the persistent vHBA, new APIs like virNodeDeviceDefineXML, > virNodeDeviceUndefine is required. Also it's useful to introduce > "autostart" for vHBA, so that the vHBA could be started automatically > after system rebooting. > > Proposed APIs: > > virNodeDevicePtr > virNodeDeviceDefineXML(virConnectPtr conn, > const char *xml, > unsigned int flags); > > int > virNodeDeviceUndefine(virConnectPtr conn, > virNodeDevicePtr dev, > unsigned int flags); > > int > virNodeDeviceSetAutostart(virNodeDevicePtr dev, > int autostart, > unsigned int flags); > > int > virNodeDeviceGetAutostart(virNodeDevicePtr dev, > int *autostart, > unsigned int flags); I don't really much like this approach. IMHO, this should all be done via the virStoragePool APIs instead. Adding define/undefine/autostart to virNodeDevice is really just duplicating the storage pool functionality. > 2) Associate vHBA with domain XML > > There are two ways to attach a LUN to a domain: as an QEMU emulated > device; or passthrough. Since passthrough a LUN is not supported in > libvirt yet, let's focus on the emulated LUN at this stage. > > New attributes "wwnn" and "wwpn" are introduced to indicate the > LUN behind the vHBA. E.g. > > <disk type='block' device='disk'> > <driver name='qemu' type='raw'/> > <source wwnn="2001001b32a9da4e" wwpn="2101001b32a90004"/> If you change the schema of the <source> element, then you must also create a new type='XXX' attribute to identify it, not just re-use type='block' > <target dev='vda' bus='virtio'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x07' > function='0x0'/> > </disk> > > Before the domain starting, we have to check if there is LUN > assigned to the vHBA, error out if not. > > Using the stable path of LUN also works, e.g. > > <source dev="/dev/disk/by-path/pci-0000\:00\:07.0-scsi-0\:0\:0\:0"/> > > But the disadvantage is the user have to figure out the stable > path himself; And we have to do checking of every stable path to > see if it's behind a vHBA in migration "Begin" stage. Or an new > XML tag for element "source" to indicate that it's behind a vHBA? > such as: > > <source dev="disk-by-path" model="vport"/> I don't much like the idea of mapping vHBA to <disk> elements, because you have a cardinality mis-match. A <disk> is equivalent of a single LUN, but a vHBA is something that provides multiple LUNs. If you want to directly associate a vHBA with a virtual guest, then this is really in the realm of SCSI HBA passthrough, not <disk> devices. If you want something mapped to the <disk> device, then the approach should be to map to a storage pool volume - something we've long talked about as broadly useful for all storage types, not just NPIV. > 3) Migration with vHBA > > One possible solution for migration with vHBA is to use one pair > of WWNN & WWPN on source host, one is using for domain, one is > reserved for migration purpose. It requires the storage admin maps > the same LUN to the two vHBAs when doing the masking and zoning. > > One of the two vHBA is called "Primary vHBA", another is called > "secondary vHBA". To maitain the relationship between these two > vHBAs, we have to introduce new XMLs to vHBA. E.g. > > In XML of primary vHBA: > > <secondary wwpn="2101001b32a90004"/> > > In XML of secondary vHBA: > > <primary wwpn="2101001b32a90002"/> > > Primary vHBA is going to be guaranteed not used by any domain which > is driven by libvirt (we do some checking eariler before the domain > starting). And it's also guaranteed that the LUN can't be used by > other domain with sVirt or Sanlock. So it's safe to have two vHBAs > on source host too. > > To prevent one using the LUN by creating vHBA using the same WWNN & > WWPN on another host, we must create the secondary vHBA on source > host, even it's not being used. > > Both primary and secondary vHBA must be defined and marked as > "autostart" so that the domain could be started after system > rebooting. > > When do migration, we have to bake a bigger cookie with secondary > vHBA's info (basically it's WWNN and WWPN) in migration "Begin" > stage, and eat that in migration "Prepare" stage on target host. > > In "Begin" stage, the XMLs represents the secondary vHBA is > constructed. And the secondary vHBA is destoyed on source host, > not undefined though. > > In "Prepare" stage, a new vHBA is created (define and start) > on target host with the same WWNN & WWPN as secondary vHBA on > source host. The LUN then should be visible to target host > automatically? and thus migration can be performed. After migration > is finished on target host, the primary vHBA on source host is > destroyed, not undefined. > > If migration fails, the new vHBA created on target host will > be destroyed and undefined. And both primary and secondary > vHBA on source host will be started, so that the domain could > be resumed. > > Finally if migration succeeds, primary vHBA on source host > will be transtered to target host as secondary vHBA (defined). > And both primary and secondary vHBA on source host will be > undefined. If we do the mapping of HBAs to guest domains using storage pools, then at a guest level, migration requires zero work. It is simply upto the management app to create the storage pool on the destination host with the same Name + UUID, but with the secondary WWNN/WWPN. The nice thing about this, is that you don't need to hardcode details of a secondary WWNN/WWPN up-front. The management app can just decide on those at the time it performs the migration, so 99% of the time there will only need to be a single vHBA setup on the SAN. During migration the mgmt app can setup a second vHBA for the target host, and once complete, delete the original vHBA entirely. > 4) Enrich HBA's XML > > It's hard to known the vHBAs created from a HBA with current > implementation. One have to dump XML of each (v)HBAs and find > out the clue with element "parent" of vHBAs. It's good to introduce > new element for HBA like "vports", so that one can easily known > what (how many) vHBAs are created from the HBA? > > And also it's good to have the maximum vports the HBA supports. > > Except these, other useful information should be exposed too, > such as the vendor name, the HBA state, PCI address, etc. > > The new XMLs should be like: > > <vports num='2' max='64'> > <vport name="scsi_host40" wwpn="2101001b32a90004"/> > <vport name="scsi_host40" wwpn="2101001b32a90005"/> > </vports> > <online/> > <vendor>QLogic</vendor> > <address type="pci" domain="0" bus="0" slot="5" function="0"/> > > "online", "vendor", "address" make sense to vHBA too. I'm trying to remember how we modelled the parent/child relationship for SR-IOV PCI cards. NPIV is a very similar concept, so we should ideally seek to model the parent/child relationship in the same manner. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list