On Thu, Jul 11, 2019 at 17:53:57 +0200, Michal Privoznik wrote: > There is this class of PCI devices that act like disks: NVMe. > Therefore, they are both PCI devices and disks. While we already > have <hostdev/> (and can assign a NVMe device to a domain > successfully) we don't have disk representation. There are three > problems with PCI assignment in case of a NVMe device: > > 1) domains with <hostdev/> can't be migrated > > 2) NVMe device is assigned whole, there's no way to assign only a > namespace > > 3) Because hypervisors see <hostdev/> they don't put block layer > on top of it - users don't get all the fancy features like > snapshots > > NVMe namespaces are way of splitting one continuous NVDIMM memory s/NVDIMM memory/NVMe device/ > into smaller ones, effectively creating smaller NVMe-s (which can > then be partitioned, LVMed, etc.) > > Because of all of this the following XML was chosen to model a > NVMe device: > > <disk type='nvme' device='disk'> > <driver name='qemu' type='raw'/> > <source type='pci' managed='yes' namespace='1'> > <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> > </source> > <target dev='vda' bus='virtio'/> > </disk> > > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > --- > docs/formatdomain.html.in | 45 +++++++++++++++++++++-- > docs/schemas/domaincommon.rng | 32 ++++++++++++++++ > tests/qemuxml2argvdata/disk-nvme.xml | 55 ++++++++++++++++++++++++++++ > 3 files changed, 129 insertions(+), 3 deletions(-) > create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > index a7a6ec32a5..545578076d 100644 > --- a/docs/formatdomain.html.in > +++ b/docs/formatdomain.html.in > @@ -2922,6 +2922,13 @@ > </backingStore> > <target dev='vdd' bus='virtio'/> > </disk> > + <disk type='nvme' device='disk'> > + <driver name='qemu' type='raw'/> > + <source type='pci' managed='yes' namespace='1'> The 'type' filed may get confusing a bit as it is supposed to be stored in virStorageSource->nvme->type, while virStorageSource has it's own type. Also I'm pondering whether managed='yes' belongs as a top-level attribute under 'source' as it's possibly specific to the 'pci' setting of type. > + <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> > + </source> > + <target dev='vde' bus='virtio'/> > + </disk> > </devices> > ...</pre> > [...] > @@ -3118,6 +3126,31 @@ > <span class="since">Since 1.0.5</span> > </p> > </dd> > + <dt><code>nvme</code></dt> > + <dd> > + To specify disk source for NVMe disk the <code>source</code> > + element has the following attributes: > + <dl> > + <dt><code>type</code></dt> > + <dd>The type of address specified in <code>address</code> > + sub-element. Currently, only <code>pci</code> value is > + accepted. > + </dd> > + > + <dt><code>managed</code></dt> > + <dd>This attribute instructs libvirt to detach NVMe > + controller automatically on domain startup (<code>yes</code>) > + or expect the controller to be detached by system > + administrator (<code>no</code>). > + </dd> > + > + <dt><code>namespace</code></dt> > + <dd>The namespace ID which should be assigned to the domain. > + According to NVMe standard, namespace numbers start from 1, > + including. > + </dd> > + </dl> > + </dd> > </dl> > With "file", "block", and "volume", one or more optional > sub-elements <code>seclabel</code>, <a href="#seclabel">described > @@ -3280,11 +3313,17 @@ > initiator IQN needed to access the source via mandatory > attribute <code>name</code>. > </dd> > + <dt><code>address</code></dt> > + <dd>For disk of type <code>nvme</code> this element > + specifies the PCI address of the host NVMe > + controller. > + <span class="since">Since 5.5.0</span> > + </dd> > </dl> > > <p> > - For a "file" or "volume" disk type which represents a cdrom or floppy > - (the <code>device</code> attribute), it is possible to define > + For a "file", "volume" or "nvme" disk type which represents a cdrom or > + floppy (the <code>device</code> attribute), it is possible to define You specifically forbid startup policy in the next commit, so what's the point of documenting it here? > policy what to do with the disk if the source file is not accessible. > (NB, <code>startupPolicy</code> is not valid for "volume" disk unless > the specified storage volume is of "file" type). This is done by the > diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng > index 31db599ab9..f367e8f6fd 100644 > --- a/docs/schemas/domaincommon.rng > +++ b/docs/schemas/domaincommon.rng [...] > @@ -1918,6 +1919,37 @@ > </optional> > </define> > > + <define name="diskSourceNvme"> > + <attribute name="type"> > + <value>nvme</value> > + </attribute> > + <optional> > + <element name="source"> > + <attribute name="type"> > + <value>pci</value> > + </attribute> > + <attribute name="namespace"> > + <ref name="uint32"/> > + </attribute> > + <optional> > + <attribute name="managed"> > + <ref name="virYesNo"/> > + </attribute> > + </optional> > + <element name="address"> > + <ref name="pciaddress"/> > + </element> > + <ref name="diskSourceCommon"/> > + <optional> > + <ref name="storageStartupPolicy"/> > + </optional> > + <optional> > + <ref name="encryption"/> > + </optional> > + </element> > + </optional> > + </define> > + > <define name="diskTarget"> > <data type="string"> > <param name="pattern">(ioemu:)?(fd|hd|sd|vd|xvd|ubd)[a-zA-Z0-9_]+</param> > diff --git a/tests/qemuxml2argvdata/disk-nvme.xml b/tests/qemuxml2argvdata/disk-nvme.xml > new file mode 100644 > index 0000000000..0b3dbad4eb > --- /dev/null > +++ b/tests/qemuxml2argvdata/disk-nvme.xml > @@ -0,0 +1,55 @@ > +<domain type='qemu'> > + <name>QEMUGuest1</name> > + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> > + <memory unit='KiB'>219136</memory> > + <currentMemory unit='KiB'>219136</currentMemory> > + <vcpu placement='static'>1</vcpu> > + <os> > + <type arch='i686' machine='pc'>hvm</type> > + <boot dev='hd'/> > + </os> > + <clock offset='utc'/> > + <on_poweroff>destroy</on_poweroff> > + <on_reboot>restart</on_reboot> > + <on_crash>destroy</on_crash> > + <devices> > + <emulator>/usr/bin/qemu-system-i686</emulator> > + <disk type='nvme' device='disk'> > + <driver name='qemu' type='raw'/> > + <source type='pci' managed='yes' namespace='1'> > + <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> > + </source> > + <target dev='vda' bus='virtio'/> > + </disk> > + <disk type='nvme' device='disk'> > + <driver name='qemu' type='raw'/> > + <source type='pci' managed='yes' namespace='2'> > + <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> Make at heast one of them use qcow2 as format. > + </source> > + <target dev='vdb' bus='virtio'/> > + </disk> > + <disk type='nvme' device='disk'> > + <driver name='qemu' type='raw'/> > + <source type='pci' managed='no' namespace='1'> > + <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> > + </source> > + <target dev='vdc' bus='virtio'/> > + </disk> > + <disk type='nvme' device='disk'> > + <driver name='qemu' type='raw'/> > + <source type='pci' managed='no' namespace='2'> > + <address domain='0x0001' bus='0x02' slot='0x00' function='0x0'/> > + <encryption format='luks'> > + <secret type='passphrase' uuid='0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f'/> > + </encryption> > + </source> > + <target dev='vdd' bus='virtio'/> > + </disk> > + <controller type='usb' index='0'/> > + <controller type='pci' index='0' model='pci-root'/> > + <controller type='scsi' index='0' model='virtio-scsi'/> > + <input type='mouse' bus='ps2'/> > + <input type='keyboard' bus='ps2'/> > + <memballoon model='none'/> > + </devices> > +</domain> I'm also missing any form of documentation describing the caveats (e.g. users should not pass in a NVMe disk the host is using) or any advantages/reasons for using this.
Attachment:
signature.asc
Description: PGP signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list