Windows guests crash when using IOMMU and more than one windows 10 VM on software raid (zfs or mdadm/lvm)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Passing a GPU to a VM and running a second VM will cause both VMs to
crash, if the root file system of the VM is on software raid.

OS:  Debian 8 / Proxmox 4.2
kernel:  4.X
qemu-server:   4.0-85
zfs module:  0.6.5.7-10_g27f3ec9
mdadm:  3.3.2-5+deb8u1
lvm:  2.02.116-pve2

Motherboard 1:  Asus Rampage III (latest bios)
Motherboard 2:  Gigabyte GA-EX58 (latest bios)
Chipset 1 and  2:  X58
CPU 1 and 2:  Xeon X5670
RAM 1 and 2:  24GB
GPU 1:  Geforce 660
GPU 2:  Geforce 970

The problem manifests slightly differently, depending on the software
raid.

Steps to reproduce:

Universal:
*Install Proxmox 4
*Select EXT4 or XFS for root FS
*Continue with sane settings for OS install
*apt-get update && apt-get dist-upgrade
*Set up IOMMU for intel as per: 
https://pve.proxmox.com/wiki/Pci_passthrough
*Set up 2 VM with 8 GB RAM each
*Pass one GPU and one set of HID to each VM
*Verify functional multi-seat

ZFS:
*Set up zfs as per:  https://pve.proxmox.com/wiki/Storage:_ZFS
*Limit ARC to 4GB
*Set up pool for VM root disks
*Create VMs in pool, same as above
*Start VM 1, then Start VM2
*The VMs will likely not crash immediately, although they might
*To reliably cause GPU driver crash, run 3d accelerated programs on both.
*The nvidia driver will crash on one VM followed shortly by the other

MDADM/LVM:
*Set up mdadm raid array
*Create PV/VG/LV for VM root disks
*Create VMs in pool, same as above
*Start VM 1, then Start VM2
*The first VM?s display will become scrambled, followed by the 2nd one
shortly after, no message of GPU driver crash


There is a difference of degree depending on the software raid.  In the
case of ZFS there is a good deal of variability in when the VMs will
crash.  On some occasions both VMs will run for extended periods of time
without issue, provided only one is doing anything requiring significant
3d hw acceleration.  In the case of MDADM/LVM, simply starting a second
VM, even with no attached PCI or USB devices, will cause the 1st VM to
crash before the 2nd VM has booted, and then the 2nd will crash.

This is only the case (thus far) when the VMs are Windows 10 and on sw
raid.  LXC containers or BSD based KVMs do not cause any problems,
although I have not tried passing hardware to them.

One VM at a time, even with GPU passthrough always works well, almost
surprisingly so.  Similarly, so does running both VMs concurrently, even
when both are performing 3d acceleration, provided the VMs? root disks are
not on software RAID.

I have not tried earlier versions of Windows, or linux kernel versions
prior to 4.  I have not tried with the root disks of the VM on non-raid,
and other disks on raid.  I have not tried ?BIOS RAID? yet, though that is
probably my next step, pending a possible response from the list.


Thanks in advance,
Brian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux