On Wed, Oct 26, 2011 at 05:51:55PM +0200, Paolo Bonzini wrote: > Hi all, > > let's kick off the discussion on what changes are needed in domain > XML for more complete SCSI support. > > There are three relevant topics: > > 1) providing channel/target/lun addresses for SCSI disks; > > 2) supporting LUN passthrough; > > 3) supporting SCSI host passthrough. > > > A fourth topic is supporting NPIV. It is a special case of SCSI > host passthrough, and it is important that any extension to the > domain XML can cover it. > > > Enhanced addressing for SCSI devices > ==================================== > > This is the simplest part. The proposal is to add a new address type > > <address type='scsi' host='...' > bus='...' target='...' lun='...'/> > > where host selects the qdev parent device, while channel/target/lun > are passed as qdev properties (the QEMU names are respectively > channel, scsi-id, lun). > > Libvirt should check for QEMU 1.0 and, for older versions, only > allow channel=lun=0 and 0<=target<=7. > > > LUN passthrough > =============== > > A SCSI block device from the host can be attached to a domain in two > ways: as an emulated LUN with SCSI commands implemented within QEMU, > or by passing SCSI commands down to the block device. The former is > handled by the existing <disk type='disk'> and <disk type='cdrom'> > XML syntax. The latter is not yet supported. > > On the QEMU side, LUN passthrough is implemented by one of the > scsi-generic and scsi-block devices. Scsi-generic requires a > /dev/sg device name, and can be applied to any device. scsi-block > is only available in QEMU 1.0 or newer, requires a block device, can > be applied only to block devices (sd/sr) and has better performance. > The choice between one and the other should be as transparent as > possible. > > Currently, using a block device as the backend for a virtio disk > implements a kind of LUN passthrough, since the guest can execute > > There are two possible choices here: > > 1) add a new <hostdev> tag. > > <hostdev mode='subsystem' type='scsi'> > <source> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </source> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </hostdev> > > Advantages: > > - allows using the same XML for all SCSI devices (i.e. scsi-generic > vs. scsi-block is an internal detail of libvirt); > > Disadvantages: > > - does not make it clear which device is being passed through; > > - completely different from the syntax that virtio is using for the > same purpose; perhaps virtio could be covered by > > <hostdev mode='subsystem' type='scsi'> > <source> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </source> > <target dev='vda' bus='virtio'/> > <address type='pci' host='...' bus='...' target='...' lun='...'/> > </hostdev> > > - <address> specifies the address to a <capability type='scsi'> > device, but the device to be passed to scsi-block is its > block_sdXX_* child (aside: it would be nice if the /dev/sgNN name > was placed somewhere in the nodedev XML for <capability type='scsi'> > devices); > > - emulated and passthrough LUNs have a completely different XML; > > - host numbers are not stable when hotplugging. > > 2) add a new <drive device='lun'> attribute. > > <drive type='block' device='lun'> > <driver name='qemu' type='raw'/> > <source dev='/dev/sda'/> > <target dev='sda' bus='scsi'> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </drive> > > Advantages: > > - allows using the same syntax for virtio and SCSI. virtio could be > changed to accept device='lun' too. > > - the passed-through device is immediately visible > > - a stable addressing is available via /dev/disk/by-id and /dev/disk/by-path > > - can easily switch a disk between emulated and passthrough modes; > > Disadvantages: > > - does not extend to scsi-generic and to host passthrough; > > > 3) something between (1) and (2). If I understand correctly > http://www.redhat.com/archives/libvir-list/2008-July/msg00429.html > this would use <hostdev mode='capability'>. More on this below. > > > SCSI target/host passthrough: rethinking <hostdev mode='capability'> > ==================================================================== > > SCSI target/host passthrough passes the entire set of LUNs attached > to a SCSI target or host. On the QEMU side, this is done manually > by adding a scsi_block or scsi_generic device for each LUN. > > This can be realized using something like: > > <hostdev mode='subsystem' type='scsi_host'> > <source> > <address type='scsi' host='...'/> > </source> > <address type='scsi' host='...'/> > </hostdev> > > <hostdev mode='subsystem' type='scsi_target'> > <source> > <address type='scsi' host='...' bus='...' target='...'/> > </source> > <address type='scsi' host='...' bus='...' target='...'/> > </hostdev> > > However, as for LUN passthrough, the main problem is that Linux host > indices are not stable. Thus, in this case using <hostdev > mode='capability'> seems like the only reasonable possibility. > > That said, <hostdev mode='capability'> has never been documented and > never even implemented. For this reason, I'm proposing to redo its > functionality in a different way. The two examples given in > http://www.redhat.com/archives/libvir-list/2008-July/msg00429.html > were the following: > > >A network card by name (ie for OpenVZ) > > > > <hostdev mode='capability'> > > <source name='eth0'/> > > </hostdev> > > > >A SCSI device by name (eg, SCSI PV passthrough), also specifying > >the target adress > > > > <hostdev mode='capability' type='scsi'> > > <source name='sg3'/> > > <target address='0:0:0:0'/> > > </hostdev> > > In my proposal: > > 1) the "mode" attribute is dropped (more precisely, only "subsystem" > is allowed and never printed; everything else is rejected); > > 2) the "type" attribute can in principle get any value that is valid > for a nodedev capability---more or less: for example the usb type > maps to the usb_device capability; :( > > 3) the "source" element can get a name "attribute" pointing to a > nodedev name, and a "rel" attribute that is "child" or "parent". > "child" instructs libvirt to search for a device possessing the > given capability, and that is a child of the named device; "parent" > instructs libvirt to pick the parent of the indicated device. When > the "name" attribute is included, the element must be empty. > > Given this, here is how the two examples above would look like: > > A network card for OpenVZ: > > - by name (has adding aliases for nodedevs ever been considered, > such as simply "eth0" in this case?): > > <hostdev type='net'> > <source name='net_eth0_00_22_68_0b_dc_ac'/> > </hostdev> > > - by position: > > <hostdev type='net'> > <source rel='child' name='pci_0000_00_19_0'/> > </hostdev> > > > A SCSI device: > > - by name: > > <hostdev type='scsi'> > <source name='scsi_0_0_0_0'/> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </hostdev> > > - by position (aliases also would allow to specify /dev/sda easily): > > <hostdev type='scsi'> > <source rel='parent' name='block_sda_ST9160411AS_5TG11QWL'/> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </hostdev> > > > A SCSI host: > > - by name: > > <hostdev type='scsi_host'> > <source name='scsi_host0'/> > <address type='scsi' host='...'/> > </hostdev> > > - by position: > > <hostdev type='scsi_host'> > <source rel='child' name='pci_0000_00_1f_2'/> > <address type='scsi' host='...'/> > </hostdev> > > > NPIV support: generalizing hostdev source addresses > =================================================== > > In NPIV, a virtual HBA is created using "virsh nodedev-create" and > passed to the guest. Such virtual adapter does have a stable > address, namely its WWN. As such, it can be addressed simply by > generalizing the kind of source address that can be passed to > <hostdev type='scsi_host'/>: > > <hostdev type='scsi_host'> > <source> > <address type='wwn' wwpn='...' wwnn='...'/> > </source> > </hostdev> > > (Note that this doesn't use <source name='...'/> and, as such, it > does not rely on the ideas above). How do you envision migration working with NPIV? > Ideas and opinions are welcome! > > Paolo > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list