On Wed, Oct 26, 2011 at 05:51:55PM +0200, Paolo Bonzini wrote: > Hi all, > > let's kick off the discussion on what changes are needed in domain > XML for more complete SCSI support. > > There are three relevant topics: > > 1) providing channel/target/lun addresses for SCSI disks; > > 2) supporting LUN passthrough; > > 3) supporting SCSI host passthrough. > > > A fourth topic is supporting NPIV. It is a special case of SCSI > host passthrough, and it is important that any extension to the > domain XML can cover it. > > > Enhanced addressing for SCSI devices > ==================================== > > This is the simplest part. The proposal is to add a new address type > > <address type='scsi' host='...' > bus='...' target='...' lun='...'/> > > where host selects the qdev parent device, while channel/target/lun > are passed as qdev properties (the QEMU names are respectively > channel, scsi-id, lun). The problem here is that we already have an address type that is used for SCSI - type='drive'. If we switch to type='scsi' then we break backwards compatibility. The current drive addressing scheme has controller, bus, unit. Controller is equivalent to what you called host. bus matches. and unit maps to lun. Thus we only need to add a new 'target' attribute to the existing drive addressing scheme, making sure it defaults to zero for any existing configs. This new attr would be ignored for any IDE/Floppy controller drives. > LUN passthrough > =============== > > A SCSI block device from the host can be attached to a domain in two > ways: as an emulated LUN with SCSI commands implemented within QEMU, > or by passing SCSI commands down to the block device. The former is > handled by the existing <disk type='disk'> and <disk type='cdrom'> > XML syntax. The latter is not yet supported. > > On the QEMU side, LUN passthrough is implemented by one of the > scsi-generic and scsi-block devices. Scsi-generic requires a > /dev/sg device name, and can be applied to any device. scsi-block > is only available in QEMU 1.0 or newer, requires a block device, can > be applied only to block devices (sd/sr) and has better performance. > The choice between one and the other should be as transparent as > possible. > > Currently, using a block device as the backend for a virtio disk > implements a kind of LUN passthrough, since the guest can execute > > There are two possible choices here: > > 1) add a new <hostdev> tag. > > <hostdev mode='subsystem' type='scsi'> > <source> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </source> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </hostdev> > > Advantages: > > - allows using the same XML for all SCSI devices (i.e. scsi-generic > vs. scsi-block is an internal detail of libvirt); > > Disadvantages: > > - does not make it clear which device is being passed through; > > - completely different from the syntax that virtio is using for the > same purpose; perhaps virtio could be covered by > > <hostdev mode='subsystem' type='scsi'> > <source> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </source> > <target dev='vda' bus='virtio'/> > <address type='pci' host='...' bus='...' target='...' lun='...'/> > </hostdev> > > - <address> specifies the address to a <capability type='scsi'> > device, but the device to be passed to scsi-block is its > block_sdXX_* child (aside: it would be nice if the /dev/sgNN name > was placed somewhere in the nodedev XML for <capability type='scsi'> > devices); > > - emulated and passthrough LUNs have a completely different XML; > > - host numbers are not stable when hotplugging. IMHO using <hostdev> is wrong for this. For assignment of logical devices, we should use the corresponding logical device type. An application reading the guest XML should be able to find all the guest drives under <drive>, and not have to look in <hostdev>. In addition, putting it under <hostdev> will mean it is no longer protected by the lock managers, nor correctly labelled by the security managers. > 2) add a new <drive device='lun'> attribute. > > <drive type='block' device='lun'> > <driver name='qemu' type='raw'/> > <source dev='/dev/sda'/> > <target dev='sda' bus='scsi'> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </drive> > > Advantages: > > - allows using the same syntax for virtio and SCSI. virtio could be > changed to accept device='lun' too. > > - the passed-through device is immediately visible > > - a stable addressing is available via /dev/disk/by-id and /dev/disk/by-path > > - can easily switch a disk between emulated and passthrough modes; > > Disadvantages: > > - does not extend to scsi-generic and to host passthrough; IMHO this is still the best for LUN passthrough because it operates correctly with lock managers and security managers, and logically we are assigning drives, so the expectation is to see them in the XML as <drive> elements. > 3) something between (1) and (2). If I understand correctly > http://www.redhat.com/archives/libvir-list/2008-July/msg00429.html > this would use <hostdev mode='capability'>. More on this below. > > > SCSI target/host passthrough: rethinking <hostdev mode='capability'> > ==================================================================== > > SCSI target/host passthrough passes the entire set of LUNs attached > to a SCSI target or host. On the QEMU side, this is done manually > by adding a scsi_block or scsi_generic device for each LUN. > > This can be realized using something like: > > <hostdev mode='subsystem' type='scsi_host'> > <source> > <address type='scsi' host='...'/> > </source> > <address type='scsi' host='...'/> > </hostdev> > > <hostdev mode='subsystem' type='scsi_target'> > <source> > <address type='scsi' host='...' bus='...' target='...'/> > </source> > <address type='scsi' host='...' bus='...' target='...'/> > </hostdev> > > However, as for LUN passthrough, the main problem is that Linux host > indices are not stable. Thus, in this case using <hostdev > mode='capability'> seems like the only reasonable possibility. Yes, we definitely want to avoid any reliance on host lun/bus/target values. It is even worse for iSCSI, where the host value is more or less guarenteed to change every time. > That said, <hostdev mode='capability'> has never been documented and > never even implemented. For this reason, I'm proposing to redo its > functionality in a different way. The two examples given in > http://www.redhat.com/archives/libvir-list/2008-July/msg00429.html > were the following: > > >A network card by name (ie for OpenVZ) > > > > <hostdev mode='capability'> > > <source name='eth0'/> > > </hostdev> > > > >A SCSI device by name (eg, SCSI PV passthrough), also specifying > >the target adress > > > > <hostdev mode='capability' type='scsi'> > > <source name='sg3'/> > > <target address='0:0:0:0'/> > > </hostdev> > > In my proposal: > > 1) the "mode" attribute is dropped (more precisely, only "subsystem" > is allowed and never printed; everything else is rejected); > > 2) the "type" attribute can in principle get any value that is valid > for a nodedev capability---more or less: for example the usb type > maps to the usb_device capability; :( The mode attribute was essentially being used to distinguish between devices that were directly associated with a specific hardware device, vs those which are logical devices on top of a hardware device. While this is somewhat redundant, we can't drop the existing mode attribute from the XML because some apps may be checking for its existance. > 3) the "source" element can get a name "attribute" pointing to a > nodedev name, and a "rel" attribute that is "child" or "parent". > "child" instructs libvirt to search for a device possessing the > given capability, and that is a child of the named device; "parent" > instructs libvirt to pick the parent of the indicated device. When > the "name" attribute is included, the element must be empty. The downside with automagically searching for devices based on relationships, it that is a pretty imprecise definition. When it comes to security labelling and access controls, imprecise things cause us more pain. eg we have one PCI device providing multiple NICs, the source attribute doesn't uniquely identify the NIC to use. We'd have to internally maintain a list of all child devices and what domains they map to and then figure out an allocation policy. > Given this, here is how the two examples above would look like: > > A network card for OpenVZ: > > - by name (has adding aliases for nodedevs ever been considered, > such as simply "eth0" in this case?): > > <hostdev type='net'> > <source name='net_eth0_00_22_68_0b_dc_ac'/> > </hostdev> > > - by position: > > <hostdev type='net'> > <source rel='child' name='pci_0000_00_19_0'/> > </hostdev> Since originally proposing the <hostdev> examples for network cards, I've switched to the opinion that this was in fact the wrong thing todo at all. The network devices should be in the <interface> element, so we have access to all the properties that this element allows for. My general view is that <hostdev> should be kept for "opaque" device assignment where we're not caring about what capabilities the device has. Just "blind" assignment of the PCI/USB/ISA hardware device based on their hardware addresses. Any device assignment where we actually need to have knowledge of the logical type of device, should be under the corresponding logical XML element. So in fact, we should *never* impement my original proposal for mode=capability in <hostdev>. > A SCSI device: > > - by name: > > <hostdev type='scsi'> > <source name='scsi_0_0_0_0'/> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </hostdev> > > - by position (aliases also would allow to specify /dev/sda easily): > > <hostdev type='scsi'> > <source rel='parent' name='block_sda_ST9160411AS_5TG11QWL'/> > <address type='scsi' host='...' bus='...' target='...' lun='...'/> > </hostdev> > > > A SCSI host: > > - by name: > > <hostdev type='scsi_host'> > <source name='scsi_host0'/> > <address type='scsi' host='...'/> > </hostdev> > > - by position: > > <hostdev type='scsi_host'> > <source rel='child' name='pci_0000_00_1f_2'/> > <address type='scsi' host='...'/> > </hostdev> Following my rational about <hostdev> usage, I'd suggest that we should in fact invent a syntax for SCSI controller passthrough via the <controller> element, since this is where we represent all other types of controller (IDE, SCSI, USB, VirtIO Serial, etc). Your thoughts on how to represent the association with the specific host device still apply of course. > NPIV support: generalizing hostdev source addresses > =================================================== > > In NPIV, a virtual HBA is created using "virsh nodedev-create" and > passed to the guest. Such virtual adapter does have a stable > address, namely its WWN. As such, it can be addressed simply by > generalizing the kind of source address that can be passed to > <hostdev type='scsi_host'/>: Yep, WWNN/WWPN are clearly the desirable unique attribute to use here. > <hostdev type='scsi_host'> > <source> > <address type='wwn' wwpn='...' wwnn='...'/> > </source> > </hostdev> > > (Note that this doesn't use <source name='...'/> and, as such, it > does not rely on the ideas above). Any time that an element has a choice of schemas to follow, we need an attribute to indentify which schema is being followed. So, if we allow <source> to work with WWNN or a nodedev device name, we'd need to add a mode='wwwn|nodedev' attribute to <source> to tell apps what schema this device follows. I like the idea of using WWNN/WWPN for NPIV controller assignment since this is a strong unique attribute. I'm not such a fan of using the node device names as we don't have a particularly strong guarantee of stability for those. eg updating to a newer Linux release might change the node device names. So I'd rather find some kind of other unique attribute for the SCSI controllers to use. In the <hostdev> XML we expose a host controller number, which is a very lame identifier because that is also very unstable. We already expose WWNN/WWPN in the nodedev XML where available. I think the approach should be to figure out some kind of way to assign another strong unique identifier to SCSI controllers, and expose that in node dev XML. Then we can use that in the domain XML <controller> element as an alternative to WWNN/WWPN addressing. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list