On 12/9/19 11:55 PM, Cole Robinson wrote:
On 12/2/19 9:26 AM, Michal Privoznik wrote:
There is this class of PCI devices that act like disks: NVMe.
Therefore, they are both PCI devices and disks. While we already
have <hostdev/> (and can assign a NVMe device to a domain
successfully) we don't have disk representation. There are three
problems with PCI assignment in case of a NVMe device:
1) domains with <hostdev/> can't be migrated
2) NVMe device is assigned whole, there's no way to assign only a
namespace
3) Because hypervisors see <hostdev/> they don't put block layer
on top of it - users don't get all the fancy features like
snapshots
NVMe namespaces are way of splitting one continuous NVDIMM memory
into smaller ones, effectively creating smaller NVMe-s (which can
then be partitioned, LVMed, etc.)
Because of all of this the following XML was chosen to model a
NVMe device:
<disk type='nvme' device='disk'>
<driver name='qemu' type='raw'/>
<source type='pci' managed='yes' namespace='1'>
<address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>
Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx>
---
docs/formatdomain.html.in | 57 +++++++++++++++++++++++--
docs/schemas/domaincommon.rng | 32 ++++++++++++++
tests/qemuxml2argvdata/disk-nvme.xml | 63 ++++++++++++++++++++++++++++
3 files changed, 149 insertions(+), 3 deletions(-)
create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 6df4a8b26e..fe871d933f 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -2944,6 +2944,13 @@
</backingStore>
<target dev='vdd' bus='virtio'/>
</disk>
+ <disk type='nvme' device='disk'>
+ <driver name='qemu' type='raw'/>
+ <source type='pci' managed='yes' namespace='1'>
+ <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
+ </source>
+ <target dev='vde' bus='virtio'/>
+ </disk>
</devices>
...</pre>
@@ -2957,7 +2964,8 @@
Valid values are "file", "block",
"dir" (<span class="since">since 0.7.5</span>),
"network" (<span class="since">since 0.8.7</span>), or
- "volume" (<span class="since">since 1.0.5</span>)
+ "volume" (<span class="since">since 1.0.5</span>), or
+ "nvme" (<span class="since">since 5.6.0</span>)
6.0.0 or whatever version this will land in
and refer to the underlying source for the disk.
<span class="since">Since 0.0.3</span>
</dd>
@@ -3140,6 +3148,43 @@
<span class="since">Since 1.0.5</span>
</p>
</dd>
+ <dt><code>nvme</code></dt>
+ <dd>
+ To specify disk source for NVMe disk the <code>source</code>
+ element has the following attributes:
+ <dl>
+ <dt><code>type</code></dt>
+ <dd>The type of address specified in <code>address</code>
+ sub-element. Currently, only <code>pci</code> value is
+ accepted.
+ </dd>
+
+ <dt><code>managed</code></dt>
+ <dd>This attribute instructs libvirt to detach NVMe
+ controller automatically on domain startup (<code>yes</code>)
+ or expect the controller to be detached by system
+ administrator (<code>no</code>).
+ </dd>
+
+ <dt><code>namespace</code></dt>
+ <dd>The namespace ID which should be assigned to the domain.
+ According to NVMe standard, namespace numbers start from 1,
+ including.
+ </dd>
+ </dl>
+
+ The difference between <code><disk type='nvme'></code>
+ and <code><hostdev/></code> is that the latter is plain
+ host device assignment with all its limitations (e.g. no live
+ migration), while the former makes hypervisor to run the NVMe
+ disk through hypervisor's block layer thus enabling all
+ features provided by the layer (e.g. snapshots, domain
+ migration, etc.). Moreover, since the NVMe disk is unbinded
+ from its PCI driver, the host kernel storage stack is not
+ involved (compared to passing say <code>/dev/nvme0n1</code> via
+ <code><disk type='block'></code> and therefore lower
+ latencies can be achieved.
+ </dd>
</dl>
With "file", "block", and "volume", one or more optional
sub-elements <code>seclabel</code>, <a href="#seclabel">described
@@ -3302,11 +3347,17 @@
initiator IQN needed to access the source via mandatory
attribute <code>name</code>.
</dd>
+ <dt><code>address</code></dt>
+ <dd>For disk of type <code>nvme</code> this element
+ specifies the PCI address of the host NVMe
+ controller.
+ <span class="since">Since 5.6.0</span>
Same
+ </dd>
</dl>
<p>
- For a "file" or "volume" disk type which represents a cdrom or floppy
- (the <code>device</code> attribute), it is possible to define
+ For a "file" or "volume" disk type which represents a cdrom or
+ floppy (the <code>device</code> attribute), it is possible to define
Stray change?
Oh right. I've realigned this area when adding the address description.
But this change does not belong here.
Also, tn the test XML you need to "s/qemu-system-i686/qemu-system-i386/"
or you'll hit a weird error. And VIR_TEST_REGENERATE_OUTPUT is also
busted, see my patches elsewhere on this list.
Yeah, I've noticed Dan posted patches after these. I've fixed that
locally but never replied to this patch. Sorry.
Reviewed-by: Cole Robinson <crobinso@xxxxxxxxxx>
Thanks,
Michal
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list