Re: [PATCH v1 10/31] schemas: Introduce disk type NVMe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/16/19 2:35 PM, Peter Krempa wrote:
On Thu, Jul 11, 2019 at 17:53:57 +0200, Michal Privoznik wrote:
There is this class of PCI devices that act like disks: NVMe.
Therefore, they are both PCI devices and disks. While we already
have <hostdev/> (and can assign a NVMe device to a domain
successfully) we don't have disk representation. There are three
problems with PCI assignment in case of a NVMe device:

1) domains with <hostdev/> can't be migrated

2) NVMe device is assigned whole, there's no way to assign only a
    namespace

3) Because hypervisors see <hostdev/> they don't put block layer
    on top of it - users don't get all the fancy features like
    snapshots

NVMe namespaces are way of splitting one continuous NVDIMM memory

s/NVDIMM memory/NVMe device/

into smaller ones, effectively creating smaller NVMe-s (which can
then be partitioned, LVMed, etc.)

Because of all of this the following XML was chosen to model a
NVMe device:

   <disk type='nvme' device='disk'>
     <driver name='qemu' type='raw'/>
     <source type='pci' managed='yes' namespace='1'>
       <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
     </source>
     <target dev='vda' bus='virtio'/>
   </disk>

Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx>
---
  docs/formatdomain.html.in            | 45 +++++++++++++++++++++--
  docs/schemas/domaincommon.rng        | 32 ++++++++++++++++
  tests/qemuxml2argvdata/disk-nvme.xml | 55 ++++++++++++++++++++++++++++
  3 files changed, 129 insertions(+), 3 deletions(-)
  create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index a7a6ec32a5..545578076d 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -2922,6 +2922,13 @@
      &lt;/backingStore&gt;
      &lt;target dev='vdd' bus='virtio'/&gt;
    &lt;/disk&gt;
+  &lt;disk type='nvme' device='disk'&gt;
+    &lt;driver name='qemu' type='raw'/&gt;
+    &lt;source type='pci' managed='yes' namespace='1'&gt;

The 'type' filed may get confusing a bit as it is supposed to be stored
in virStorageSource->nvme->type, while virStorageSource has it's own
type.

Well, I'm trygin to mix two things here: <disk/> and <hostdev/> (well, PCI devices). There are other types of NVMe than PCIe attached ones. They are called NVMe-oF (NVMe over Fabrics), so I've figured that we want to specifically tell that NVMe we are dealing with here is the PCIe one. But I guess it can also be added later, if we ever decide to support NVMe-oF (disks without @type would have 'pci' autofilled in a post parse callback).


Also I'm pondering whether managed='yes' belongs as a top-level
attribute under 'source' as it's possibly specific to the 'pci' setting
of type.

Yeah, designing future proof XML is hard. I'm open to any suggestion.


+      &lt;address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/&gt;
+    &lt;/source&gt;
+    &lt;target dev='vde' bus='virtio'/&gt;
+  &lt;/disk&gt;
  &lt;/devices&gt;
  ...</pre>

[...]

@@ -3118,6 +3126,31 @@
                <span class="since">Since 1.0.5</span>
                </p>
                </dd>
+            <dt><code>nvme</code></dt>
+              <dd>
+              To specify disk source for NVMe disk the <code>source</code>
+              element has the following attributes:
+              <dl>
+                <dt><code>type</code></dt>
+                <dd>The type of address specified in <code>address</code>
+                sub-element. Currently, only <code>pci</code> value is
+                accepted.
+                </dd>
+
+                <dt><code>managed</code></dt>
+                <dd>This attribute instructs libvirt to detach NVMe
+                controller automatically on domain startup (<code>yes</code>)
+                or expect the controller to be detached by system
+                administrator (<code>no</code>).
+                </dd>
+
+                <dt><code>namespace</code></dt>
+                <dd>The namespace ID which should be assigned to the domain.
+                According to NVMe standard, namespace numbers start from 1,
+                including.
+                </dd>
+              </dl>
+              </dd>
            </dl>
          With "file", "block", and "volume", one or more optional
          sub-elements <code>seclabel</code>, <a href="#seclabel">described
@@ -3280,11 +3313,17 @@
              initiator IQN needed to access the source via mandatory
              attribute <code>name</code>.
            </dd>
+          <dt><code>address</code></dt>
+          <dd>For disk of type <code>nvme</code> this element
+            specifies the PCI address of the host NVMe
+            controller.
+            <span class="since">Since 5.5.0</span>
+          </dd>
          </dl>
<p>
-        For a "file" or "volume" disk type which represents a cdrom or floppy
-        (the <code>device</code> attribute), it is possible to define
+        For a "file", "volume" or "nvme" disk type which represents a cdrom or
+        floppy (the <code>device</code> attribute), it is possible to define

You specifically forbid startup policy in the next commit, so what's the
point of documenting it here?

Ah, good point. Back in the day I still wanted to make startupPolicy work, but then decided not to.



          policy what to do with the disk if the source file is not accessible.
          (NB, <code>startupPolicy</code> is not valid for "volume" disk unless
           the specified storage volume is of "file" type). This is done by the
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 31db599ab9..f367e8f6fd 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng

[...]

@@ -1918,6 +1919,37 @@
      </optional>
    </define>
+ <define name="diskSourceNvme">
+    <attribute name="type">
+      <value>nvme</value>
+    </attribute>
+    <optional>
+      <element name="source">
+        <attribute name="type">
+          <value>pci</value>
+        </attribute>
+        <attribute name="namespace">
+          <ref name="uint32"/>
+        </attribute>
+        <optional>
+          <attribute name="managed">
+            <ref name="virYesNo"/>
+          </attribute>
+        </optional>
+        <element name="address">
+          <ref name="pciaddress"/>
+        </element>
+        <ref name="diskSourceCommon"/>
+        <optional>
+          <ref name="storageStartupPolicy"/>
+        </optional>
+        <optional>
+          <ref name="encryption"/>
+        </optional>
+      </element>
+    </optional>
+  </define>
+
    <define name="diskTarget">
      <data type="string">
        <param name="pattern">(ioemu:)?(fd|hd|sd|vd|xvd|ubd)[a-zA-Z0-9_]+</param>
diff --git a/tests/qemuxml2argvdata/disk-nvme.xml b/tests/qemuxml2argvdata/disk-nvme.xml
new file mode 100644
index 0000000000..0b3dbad4eb
--- /dev/null
+++ b/tests/qemuxml2argvdata/disk-nvme.xml
@@ -0,0 +1,55 @@
+<domain type='qemu'>
+  <name>QEMUGuest1</name>
+  <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
+  <memory unit='KiB'>219136</memory>
+  <currentMemory unit='KiB'>219136</currentMemory>
+  <vcpu placement='static'>1</vcpu>
+  <os>
+    <type arch='i686' machine='pc'>hvm</type>
+    <boot dev='hd'/>
+  </os>
+  <clock offset='utc'/>
+  <on_poweroff>destroy</on_poweroff>
+  <on_reboot>restart</on_reboot>
+  <on_crash>destroy</on_crash>
+  <devices>
+    <emulator>/usr/bin/qemu-system-i686</emulator>
+    <disk type='nvme' device='disk'>
+      <driver name='qemu' type='raw'/>
+      <source type='pci' managed='yes' namespace='1'>
+        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
+      </source>
+      <target dev='vda' bus='virtio'/>
+    </disk>
+    <disk type='nvme' device='disk'>
+      <driver name='qemu' type='raw'/>
+      <source type='pci' managed='yes' namespace='2'>
+        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>

Make at heast one of them use qcow2 as format.

Okay.


+      </source>
+      <target dev='vdb' bus='virtio'/>
+    </disk>
+    <disk type='nvme' device='disk'>
+      <driver name='qemu' type='raw'/>
+      <source type='pci' managed='no' namespace='1'>
+        <address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
+      </source>
+      <target dev='vdc' bus='virtio'/>
+    </disk>
+    <disk type='nvme' device='disk'>
+      <driver name='qemu' type='raw'/>
+      <source type='pci' managed='no' namespace='2'>
+        <address domain='0x0001' bus='0x02' slot='0x00' function='0x0'/>
+        <encryption format='luks'>
+          <secret type='passphrase' uuid='0a81f5b2-8403-7b23-c8d6-21ccc2f80d6f'/>
+        </encryption>
+      </source>
+      <target dev='vdd' bus='virtio'/>
+    </disk>
+    <controller type='usb' index='0'/>
+    <controller type='pci' index='0' model='pci-root'/>
+    <controller type='scsi' index='0' model='virtio-scsi'/>
+    <input type='mouse' bus='ps2'/>
+    <input type='keyboard' bus='ps2'/>
+    <memballoon model='none'/>
+  </devices>
+</domain>

I'm also missing any form of documentation describing the caveats (e.g.
users should not pass in a NVMe disk the host is using)

Since I'm using virHostdevManager to track used NVMe-s, any attempt to start a domain with a taken NVMe disk will fail. So users will see that immediately :-) But okay, I will enhance the docs.

Michal

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list



[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux