Re: [PATCH v3 10/30] schemas: Introduce disk type NVMe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/2/19 3:26 PM, Michal Privoznik wrote:
There is this class of PCI devices that act like disks: NVMe.
Therefore, they are both PCI devices and disks. While we already
have <hostdev/> (and can assign a NVMe device to a domain
successfully) we don't have disk representation. There are three
problems with PCI assignment in case of a NVMe device:

1) domains with <hostdev/> can't be migrated

2) NVMe device is assigned whole, there's no way to assign only a
    namespace

3) Because hypervisors see <hostdev/> they don't put block layer
    on top of it - users don't get all the fancy features like
    snapshots

NVMe namespaces are way of splitting one continuous NVDIMM memory
into smaller ones, effectively creating smaller NVMe-s (which can
then be partitioned, LVMed, etc.)

Because of all of this the following XML was chosen to model a
NVMe device:

   <disk type='nvme' device='disk'>
     <driver name='qemu' type='raw'/>
     <source type='pci' managed='yes' namespace='1'>
       <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
     </source>
     <target dev='vda' bus='virtio'/>
   </disk>


Last week I've discussed this on IRC with Dan an Maxim (bot CC'ed) and there was a suggestion to accept /dev/nvmeXXX path instead of PCI address. The reasoning was that there is a tool that Maxim wrote (alas not merged into qemu/kvm yet) that acts like a standalone daemon which does VFIO magic and then serves qemus connecting to it (this allows a NVMe disk to be shared between multiple qemus which is now not allowed currently due to VFIO restriction). And if we accepted /dev/nvmeXXX here we could change the backend less invasively - we could either use qemu's -drive nvme://XXXX or the new tool.

On the other hand, /dev/nvmeXXX (even though it may be a bit more user friendly) wouldn't work if host kernel doesn't have NVMe driver or if the disk is already detached. PCI address as I have it here.

Note that sysfs offers translations both ways [PCI address, namespace] <-> /dev/nvmeXXX so that shouldn't be a limitation.

Thoughts?

Michal

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux