On 12/2/19 9:26 AM, Michal Privoznik wrote: > There is this class of PCI devices that act like disks: NVMe. > Therefore, they are both PCI devices and disks. While we already > have <hostdev/> (and can assign a NVMe device to a domain > successfully) we don't have disk representation. There are three > problems with PCI assignment in case of a NVMe device: > > 1) domains with <hostdev/> can't be migrated > > 2) NVMe device is assigned whole, there's no way to assign only a > namespace > > 3) Because hypervisors see <hostdev/> they don't put block layer > on top of it - users don't get all the fancy features like > snapshots > > NVMe namespaces are way of splitting one continuous NVDIMM memory > into smaller ones, effectively creating smaller NVMe-s (which can > then be partitioned, LVMed, etc.) > > Because of all of this the following XML was chosen to model a > NVMe device: > > <disk type='nvme' device='disk'> > <driver name='qemu' type='raw'/> > <source type='pci' managed='yes' namespace='1'> > <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> > </source> > <target dev='vda' bus='virtio'/> > </disk> > > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > --- > docs/formatdomain.html.in | 57 +++++++++++++++++++++++-- > docs/schemas/domaincommon.rng | 32 ++++++++++++++ > tests/qemuxml2argvdata/disk-nvme.xml | 63 ++++++++++++++++++++++++++++ > 3 files changed, 149 insertions(+), 3 deletions(-) > create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > index 6df4a8b26e..fe871d933f 100644 > --- a/docs/formatdomain.html.in > +++ b/docs/formatdomain.html.in > @@ -2944,6 +2944,13 @@ > </backingStore> > <target dev='vdd' bus='virtio'/> > </disk> > + <disk type='nvme' device='disk'> > + <driver name='qemu' type='raw'/> > + <source type='pci' managed='yes' namespace='1'> > + <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> > + </source> > + <target dev='vde' bus='virtio'/> > + </disk> > </devices> > ...</pre> > > @@ -2957,7 +2964,8 @@ > Valid values are "file", "block", > "dir" (<span class="since">since 0.7.5</span>), > "network" (<span class="since">since 0.8.7</span>), or > - "volume" (<span class="since">since 1.0.5</span>) > + "volume" (<span class="since">since 1.0.5</span>), or > + "nvme" (<span class="since">since 5.6.0</span>) 6.0.0 or whatever version this will land in > and refer to the underlying source for the disk. > <span class="since">Since 0.0.3</span> > </dd> > @@ -3140,6 +3148,43 @@ > <span class="since">Since 1.0.5</span> > </p> > </dd> > + <dt><code>nvme</code></dt> > + <dd> > + To specify disk source for NVMe disk the <code>source</code> > + element has the following attributes: > + <dl> > + <dt><code>type</code></dt> > + <dd>The type of address specified in <code>address</code> > + sub-element. Currently, only <code>pci</code> value is > + accepted. > + </dd> > + > + <dt><code>managed</code></dt> > + <dd>This attribute instructs libvirt to detach NVMe > + controller automatically on domain startup (<code>yes</code>) > + or expect the controller to be detached by system > + administrator (<code>no</code>). > + </dd> > + > + <dt><code>namespace</code></dt> > + <dd>The namespace ID which should be assigned to the domain. > + According to NVMe standard, namespace numbers start from 1, > + including. > + </dd> > + </dl> > + > + The difference between <code><disk type='nvme'></code> > + and <code><hostdev/></code> is that the latter is plain > + host device assignment with all its limitations (e.g. no live > + migration), while the former makes hypervisor to run the NVMe > + disk through hypervisor's block layer thus enabling all > + features provided by the layer (e.g. snapshots, domain > + migration, etc.). Moreover, since the NVMe disk is unbinded > + from its PCI driver, the host kernel storage stack is not > + involved (compared to passing say <code>/dev/nvme0n1</code> via > + <code><disk type='block'></code> and therefore lower > + latencies can be achieved. > + </dd> > </dl> > With "file", "block", and "volume", one or more optional > sub-elements <code>seclabel</code>, <a href="#seclabel">described > @@ -3302,11 +3347,17 @@ > initiator IQN needed to access the source via mandatory > attribute <code>name</code>. > </dd> > + <dt><code>address</code></dt> > + <dd>For disk of type <code>nvme</code> this element > + specifies the PCI address of the host NVMe > + controller. > + <span class="since">Since 5.6.0</span> Same > + </dd> > </dl> > > <p> > - For a "file" or "volume" disk type which represents a cdrom or floppy > - (the <code>device</code> attribute), it is possible to define > + For a "file" or "volume" disk type which represents a cdrom or > + floppy (the <code>device</code> attribute), it is possible to define Stray change? Also, tn the test XML you need to "s/qemu-system-i686/qemu-system-i386/" or you'll hit a weird error. And VIR_TEST_REGENERATE_OUTPUT is also busted, see my patches elsewhere on this list. Reviewed-by: Cole Robinson <crobinso@xxxxxxxxxx> - Cole -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list