Doc: How to use NPIV in libvirt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Before posting it to WIKI or somewhere, I want to see if there is any
suggestions on it, or if I missed something.


============================================

                  How to use NPIV in libvirt

  I planned to wrote a document about how to use NPIV in libvirt after
more features are supported, but it looks like I can't wait till then,
got lots lots of questions from both the bugs and mails. So here we go.

  The document tries to summary up the things about NPIV that libvirt
supports till now, and the TODO list. Feedback or suggestion is welcomed.

1) How to find out which HBA(s) support vHBA

  For libvirt newer than "1.0.4", you can find it out simply by:

    # virsh nodedev-list --cap vports

  "--cap vports" is to tell "nodedev-list" only outputs the devices
which support "vports" capability, i.e. support vHBA.

  And also since version "1.0.4", you should be able to know the maximum
vports the HBA supports and the current vports number from the HBA's XML,
e.g.

    # virsh nodedev-dumpxml scsi_host5
    <device>
      <name>scsi_host5</name>
      <parent>pci_0000_04_00_1</parent>
      <capability type='scsi_host'>
        <host>5</host>
        <capability type='fc_host'>
          <wwnn>2001001b32a9da4e</wwnn>
          <wwpn>2101001b32a9da4e</wwpn>
          <fabric_wwn>2001000dec9877c1</fabric_wwn>
        </capability>
        <capability type='vport_ops'>
          <max_vports>164</max_vports>
          <vports>5</vports>
        </capability>
      </capability>
    </device>

  For libvirt older than "1.0.4", it's a bit complicated than above:

  First you need to find out all the HBAs, e.g.

    # virsh nodedev-list --cap scsi_host
    scsi_host0
    scsi_host1
    scsi_host2
    scsi_host3
    scsi_host4
    scsi_host5

  And then, to see if the HBA supports vHBA, check if the dumped
XML contains "vport_ops" capability. E.g.

    # virsh nodedev-dumpxml scsi_host3
    <device>
      <name>scsi_host3</name>
      <parent>pci_0000_00_08_0</parent>
      <capability type='scsi_host'>
        <host>3</host>
      </capability>
    </device>

  That says "scsi_host3" doesn't support vHBA

    # virsh nodedev-dumpxml scsi_host5
    <device>
      <name>scsi_host5</name>
      <parent>pci_0000_04_00_1</parent>
      <capability type='scsi_host'>
        <host>5</host>
        <capability type='fc_host'>
          <wwnn>2001001b32a9da4e</wwnn>
          <wwpn>2101001b32a9da4e</wwpn>
          <fabric_wwn>2001000dec9877c1</fabric_wwn>
        </capability>
        <capability type='vport_ops' />
      </capability>
    </device>

  But "scsi_host5" supports it.

  One might be confused with the node device naming style (e.g. scsi_host5)
in this document and RHEL6 Virtualization Guide [1]
(pci_10df_fe00_scsi_host_0). It's because of libvirt has two backends for
node device driver: udev and HAL. We prefer the udev backend more than HAL
backend in internal implementation, I think there is good enough reason to
do so (HAL is maintenance mode now). I believe udev backend is used more
than HAL backend, but if your destribution packager build libvirt without
udev backend, don't be surprised with the node device names like the ones
in [1].

2) How to create a vHBA

  Pick up one HBA which supports vHBA, use it's "node device name" as the
"parent" of vHBA, and specify the "wwnn" and "wwpn" in the vHBA's XML.  E.g.

    <device>
      <name>scsi_host6</name>
      <parent>scsi_host5</parent>
      <capability type='scsi_host'>
        <capability type='fc_host'>
          <wwnn>2001001b32a9da5e</wwnn>
          <wwpn>2101001b32a9da5e</wwpn>
        </capability>
      </capability>
    </device>

  Then create the vHBA with virsh command "nodedev-create" (assuming above
XML file is named "vhba.xml"):

    # virsh nodedev-create vhba.xml
    Node device scsi_host6 created from vhba.xml

  Since "0.9.10", libvirt will generate "wwnn" and "wwpn" automatically if
they are not specified. It means one can create the vHBA by a more simple
XML like:

    <device>
      <parent>scsi_host5</parent>
      <capability type='scsi_host'>
        <capability type='fc_host'>
        </capability>
      </capability>
    </device>

3) How to destroy a vHBA

  As usual, destroying something is always simpler than creating it:

    # virsh nodedev-destroy scsi_host6
    Destroyed node device 'scsi_host6'

  You might already realize that the vHBA is removed permanently, don't be
surprised, it's the life, node device driver doesn't support persistent
config. I won't say it's nightmare for users who screams when realizing the
vHBA disappeared after a system rebooting, but it's relatively not good,
(assuming that you got the wwnn:wwpn pair from the storage admin, but didn't
record it). Fortunately, we support the persistent vHBA now, see next section
for details.

4) How to create a persistent vHBA

  Let's go back to the history a bit firstly.

  Prior to libvirt "1.0.5", one can define a "scsi" type pool based on a
(v)HBA by it's scsi host name (e.g.  "host5" in XML below). E.g.

    <pool type='scsi'>
      <name>poolhba0</name>
      <uuid>e9392370-2917-565e-692b-d057f46512d6</uuid>
      <capacity unit='bytes'>0</capacity>
      <allocation unit='bytes'>0</allocation>
      <available unit='bytes'>0</available>
      <source>
        <adapter name='host0'/>
      </source>
      <target>
        <path>/dev/disk/by-path</path>
        <permissions>
          <mode>0700</mode>
          <owner>0</owner>
          <group>0</group>
        </permissions>
      </target>
    </pool>

  Quite nice? yeah, at least it looks so, but the problem is the scsi host
number is *unstable* (it can be changed after system rebooting, or kernel
module reloading, or a vHBA recreating etc), and thus the "scsi" type pool
based on a (v)HBA becomes unstable too. Obviously it doesn't help on the
"persistent vHBA" problem.

To solve the problems, since libvirt "1.0.5", we introduced new XML schema
to indicate the (v)HBA.  An example of the XML:

    <pool type='scsi'>
      <name>poolvhba0</name>
      <uuid>e9392370-2917-565e-692b-d057f46512d6</uuid>
      <source>
<adapter type='fc_host' parent='scsi_host5' wwnn='20000000c9831b4b' wwpn='10000000c9831b4b'/>
      </source>
      <target>
        <path>/dev/disk/by-path</path>
        <permissions>
          <mode>0700</mode>
          <owner>0</owner>
          <group>0</group>
        </permissions>
      </target>
    </pool>

It allows to define a "scsi" type pool based on either a HBA or a vHBA. For HBA, "parent" attribute can be omitted. For vHBA, if "parent" is not specified, libvirt will pick up the first HBA which supports vHBA, and doesn't exceed the
maximum vports it supports, automatically.

For the pool based on a vHBA, When the pool is starting, libvirt will check
if the specified vHBA (wwnn:wwpn) is existing on host or not, if it doesn't
exist yet, libvirt will create it automatically. When the pool is being stopped, the vHBA is destroyed. But since storage driver supports the persistent config, one can easily gets the vHBA with same "wwnn:wwpn" in next starting (Don't scream
if your pool is transient).

It's not the end if you want to get the vHBA created automatically after system
rebooting, you will need to set the pool as "autostart":

    # virsh pool-autostart poolvhba0

One might be curious about why not to support persistent config for node device driver, and support to create persistent vHBA there. One of the reason is that it will be duplicate with what storage pool does. And another reason (the important one) is we want to assiciate the libvirt storage pool/volume with domain (see
section "Use LUN for guest" below).


5) How to find out the LUN's path

If you have defined the "scsi" type pool based on the (v)HBA, it's simple to
lookup what LUNs attached to the (v)HBA by virsh command "vol-list", e.g.

    # virsh vol-list poolvhba0 --details
    Name Path Type    Capacity  Allocation
--------------------------------------------------------------------------------------------------------
unit:0:2:0 /dev/disk/by-path/pci-0000:04:00.1-fc-0x203500a0b85ad1d7-lun-0 block 20.01 GiB 20.01 GiB

If you have not defined a "scsi" type pool based on the (v)HBA, you can find it out (v)HBA by either virsh command "nodedev-list --tree", or iterating sysfs manually.

To find out the LUNs by virsh command "nodedev-list" (irrelevant ouputs are
omitted):

    # virsh nodedev-list --tree
    +- pci_0000_00_0d_0
    |   |
    |   +- pci_0000_04_00_0
    |   |   |
    |   |   +- scsi_host4
    |   |
    |   +- pci_0000_04_00_1
    |       |
    |       +- scsi_host5
    |           |
    |           +- scsi_host7
    |           +- scsi_target5_0_0
    |           |   |
    |           |   +- scsi_5_0_0_0
    |           |
    |           +- scsi_target5_0_1
    |           |   |
    |           |   +- scsi_5_0_1_0
    |           |
    |           +- scsi_target5_0_2
    |           |   |
    |           |   +- scsi_5_0_2_0
    |           |       |
    |           |       +- block_sdb_3600a0b80005adb0b0000ab2d4cae9254
    |           |
    |           +- scsi_target5_0_3
    |               |
    |               +- scsi_5_0_3_0

  "scsi_host5" is an HBA on my host, it has a LUN named
"block_sdb_3600a0b80005adb0b0000ab2d4cae9254", don't be confused with the naming, it's the naming style libvirt uses, meaningful only for libvirt. It indicates
the LUN has a short device path "/dev/sdb", and a ID
"3600a0b80005adb0b0000ab2d4cae9254":

    # ls /dev/disk/by-id/ | grep 3600a0b80005adb0b0000ab2d4cae9254
    scsi-3600a0b80005adb0b0000ab2d4cae9254

  To manually find the LUNs of a (v)HBA:

  First, you need to iterate over all the directores begins with the SCSI
scsi host number of the v(HBA) under "/sys/bus/scsi/devices". E.g. I will look
up the LUNs of the HBA with SCSI host number 5 on my host:

    # ls /sys/bus/scsi/devices/5:* -d
    /sys/bus/scsi/devices/5:0:0:0  /sys/bus/scsi/devices/5:0:1:0
    /sys/bus/scsi/devices/5:0:2:0  /sys/bus/scsi/devices/5:0:3:0

    # ls /sys/bus/scsi/devices/5\:0\:3\:0/block/sdc

  It means scsi_host5 has a LUN attached with device name "sdc" on address
"5:0:3:0".

    # ls /sys/bus/scsi/devices/5\:0\:1\:0/ | grep block
    device_blocked

  scsi_host5 doesn't have a LUN attached on address "5:0:2:0"

The device name like "sdc" is not stable, to find out the stable path, find
out the symbol link which points to the device name. E.g.

    # ls -l /dev/disk/by-path/
lrwxrwxrwx. 1 root root 9 Sep 10 22:28 pci-0000:00:07.0-scsi-0:0:0:0 -> ../../sda lrwxrwxrwx. 1 root root 10 Sep 10 22:28 pci-0000:00:07.0-scsi-0:0:0:0-part1 -> ../../sda1 lrwxrwxrwx. 1 root root 9 Sep 10 22:28 pci-0000:04:00.1-fc-0x203400a0b85ad1d7-lun-0 -> ../../sdc

Then "/dev/disk/by-path/pci-0000:04:00.1-fc-0x203400a0b85ad1d7-lun-0" is the
stable path of the LUN attached to address "5:0:3:0". Of course, you can use
the similiar method to get the "by-id | by-uuid | by-label" stable path.

6) Use the LUN to guest

Since libvirt "1.0.5", we supported to use the storage volume as disk source by
two new attributes ("pool" and "volume") for disk "<source"> element. E.g.

    <disk type='volume' device='disk'>
      <driver name='qemu' type='raw'/>
      <source pool='poolvhba0' volume='unit:0:2:0 '/>
      <target dev='hda' bus='ide'/>
    </disk>

  There are lots of advantage to do so. Since the mainly purpose of the
document is about "how to use", I will only mention two here to persuade
you using the it. First, you don't need to look up the LUN's path youself.
Second, assuming that you want to migrate a domain which uses a LUN attached
to a vHBA, do you want to create the vHBA manually on target host? With the
pool, you can simply define/start a pool with same config on target host.

  So, if your libvirt is newer than "1.0.5", we recommend you to define the
"scsi" type pool based on the (v)HBA, and use "pool/volume" names to use
the LUN as disk source.

  You can either use the LUN as qemu emulated disk, or passthrough it to
guest.

  To use it as qemu emulated disk, specifying the "device" attribute as
"device='disk|cdrom|floppy'". E.g.

    <disk type='volume' device='disk'>
      <driver name='qemu' type='raw'/>
      <source pool='blk-pool0' volume='blk-pool0-vol0'/>
      <target dev='hda' bus='ide'/>
    </disk>

  Or (using the LUN's path directly)

    <disk type='volume' device='disk'>
      <driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-path/pci-0000\:04\:00.1-fc-0x203400a0b85ad1d7-lun-0'/>
      <target dev='sda' bus='scsi'/>
    </disk>

  To passthrough the LUN, specifying the "device" attribute as
"device='lun'", e.g.

    <disk type='volume' device='lun'>
      <driver name='qemu' type='raw'/>
<source dev='/dev/disk/by-path/pci-0000\:04\:00.1-fc-0x203400a0b85ad1d7-lun-0'/>
      <target dev='sda' bus='scsi'/>
    </disk>

6) Future work

  * NPIV based SCSI host passthrough
    That's what the users ask: How to passthrough a (v)HBA to guest?
  * Expose vendor information, LUN's path, state of (v)HBA in its XML
  * May be a virsh command to simplify vHBA creation with options

[1] http://www.linuxtopia.org/online_books/rhel6/rhel_6_virtualization/rhel_6_virtualization_chap-Para-virtualized_Windows_Drivers_Guide-N_Port_ID_Virtualization_NPIV.html

Regards,
Osier

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]