On 04/08/2014 10:47 PM, Alex Williamson wrote: > On Tue, 2014-04-08 at 22:24 +0300, rndbit wrote: >> On 04/08/2014 08:00 PM, Alex Williamson wrote: >>> On Tue, 2014-04-08 at 17:01 +0300, rndbit wrote: >>>> Hello, >>>> I am one of those early tinkerers with VGA passthrough. This is very exciting feature >>>> and as soon as i saw it i thought to myself that i want that "95% speed of bare metal" >>>> as various sources claimed. So i invested into some harware only to find out that >>>> actual 3d performance is far lower. I am not sure why it is the case so maybe someone >>>> could either clarify if it is known defect or something to be expected. Or if this is >>>> actually a bug and someone someone cared enough to tackle it i would gladly be a >>>> test-monkey. >>>> >>>> Hardware: >>>> Motherboard: SABERTOOTH 990FX R2.0 (bios v2104) >>>> CPU: AMD FX(tm)-8350 >>>> GPU1 (host): GeForce GTX 550 Ti >>>> GPU2 (guest): Radeon R9 270X >>>> QEMU 1.7.90 (from git) >>>> >>>> qemu command line: >>>>> /usr/local/bin/qemu-system-x86_64 \ >>>>> -enable-kvm -m 4096 -cpu qemu64 -machine q35,accel=kvm \ >>>>> -smp 8,sockets=1,cores=8,threads=1 \ >>>>> -drive file=/dev/sdc,if=none,id=virtio-disk0,format=raw,cache=none,aio=native \ >>>>> -device virtio-blk-pci,scsi=off,drive=virtio-disk0,id=disk0 \ >>>>> -drive file=/dev/sdd,if=none,id=virtio-disk1,format=raw,cache=none,aio=native \ >>>>> -device virtio-blk-pci,scsi=off,drive=virtio-disk1,id=disk1 \ >>>>> -device e1000,netdev=vnet0,mac=40:01:23:ff:9a:00 -netdev tap,id=vnet0 \ >>>>> -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \ >>>>> -device vfio-pci,host=06:00.0,bus=root.1,addr=00.0,multifunction=on,x-vga=on \ >>>>> -device vfio-pci,host=00:13.0,bus=pcie.0 \ >>>>> -device vfio-pci,host=00:13.2,bus=pcie.0 \ >>>>> -bios /home/novist/opt/src/seabios/out/bios.bin \ >>>>> -vga none >>>> >>>> Kernel: 3.14 with kvm patches for 3.15 (i was hoping they would make a difference but sadly they dont). >>>> Additional kvm options: iommu=1 vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 >>>> >>>>> 00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller >>>>> 00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller >>>>> 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT [Radeon R9 270X] >>>>> 06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] >>>> Actual problem is that GPU seems to be underused. For example in passmark benchmark i >>>> score only 15 FPS in directx9 complex test while on bare metal its 100 FPS. That is >>>> quite a gap. With kernels prior 3.14 it was even worse - i could get only 10 FPS. >>>> >>>> Another rather naive test i performed was checking gpu load with process hacker >>>> while playing actual recent game (arma3 in this case). GPU utilization sits at about >>>> 20-30%, rarely 40% spikes. And it is lagy of course. Low settings make it somewhat >>>> playable. On bare metal though very high settings run smooth. Its worth noting that >>>> directx10/11 bencmarks of passmark perform much better than directx9 ones. >>>> >>>> Here are benchmark results and GPU load graphs: >>>> GPU load: http://imgur.com/M70CnIV >>>> Benchmark: http://imgur.com/IeYh4Zc >>>> >>>> As you see while directx11 performs at about acceptable rate (considering others preach >>>> glorious 95% of bare metal performance) directx10 test is quite slower and directx9 >>>> performs terribly. >>>> >>>> Not sure what other information i could provide. If there is any - i would be happy to >>>> look it up. So any idea where this mythical "95% of bare metal performance" can be found? >>>> Cant wait for some insight from a professional. >>>> >>>> P.S. i am also adding 2d benchmark results if anyone cares. >>>> GPU load: http://imgur.com/28x5DN8 >>>> Benchmark: http://imgur.com/562PJYl >>> >>> Have you tried assigning the Nvidia GPU? If you doubt that good >>> performance is possible, see these images of the same tests: >>> >>> http://imgur.com/a/7nELz >>> >>> This is with an Nvidia Quadro K4000 assigned as a secondary device, >>> without VGA quirks on an Intel host. It's entirely possible that VGA >>> quirks are getting in the way in your test and may still with the >>> GTX550Ti using VGA quirks. I guarantee that if the quirks are hit in >>> any sort of hot path, performance will plummet. >>> >>> It's also important to tune the setup, the above tests use hv-time on >>> the vCPU and a kernel that supports it, as well as large pages for the >>> guest and vCPU pinning. >>> >>> There's potentially still work to do, the above tests show some >>> significant hits in the 2D test, but 3D is quite good. >>> >>> If you'd like to help improve the situation, you can enable debug when >>> compiling qemu vfio, uncomment hw/misc/vfio.c:#define DEBUG_VFIO. This >>> will produce a lot of output, especially on guest boot, but it should >>> settle down as the drivers get loaded and direct access is used. If it >>> doesn't settle down and it's through the quirks, we may never be able to >>> achieve good performance since we don't own the driver. If it does >>> settle down, you can use tools like perf and trace to analyze where time >>> is being spent. Thanks, >>> >>> Alex >>> >>> >> >> I have not tried passing-through nvidia gpu as i have read some scary >> tales of nvidia gpus being problematic. Besides now it would be a little >> bit pointless (unless its only for test) as i got more powerful GPU for >> use in VM. I will definitely try figuring out where time is spent. Some >> of your mentioned stuff is really all new to me and might take bit of >> time to figure out. > > IME, Nvidia actually works better than AMD. The Nvidia quirks mostly > come from the reverse engineering that the nouveau project has done > while the AMD quirks come solely from my reverse engineering. Nvidia is > also interested in supporting the professional class cards in VMs > whereas I haven't seen any interest in that from AMD. Note that the > professional class cards are the Quadro, GRID, and Tesla boards and > they're supported in a secondary display configuration. Using x-vga=on > with consumer Geforce or Radeon boards is equally not supported by > either vendor. > >> Im not 100% sure what vga quirks are so please forgive for possibly >> stupid question but are they enabled by x-vga=on? Something at the back >> of my head is telling me that you are probably not using that flag since >> your card is passed-through as >secondary< device. I also tried to >> achieve same thing with no luck. What i tried was basically -vga std and >> removing x-vga=on bit. Result was device manager indicating non-working >> device. I believe error was 12 (not enough resources) or something very >> similar. If you have ideas regarding this bit i would love to hear them. >> In the meantime ill snoop around stuff you mentioned, thank you for hints. > > Yes, secondary mode is to provide the guest with a -vga device in > addition to the assigned GPU, and not use the x-vga=on option. Whether > this works often depends on the driver in the guest. > > The quirks that I'm talking about are fixes for hardware backdoors. For > instance PCI config space is often mirrored in MMIO config space on > graphics cards. Running in a virtual machine requires that we emulate > portions of config space, so we need to trap accesses to the MMIO > mirror. If the driver uses the mirror, or anything within PAGE_SIZE of > the mirror, for performance critical access, the traps required for the > quirk will seriously affect the performance. > > The Q35 QEMU model could also have something to do with the performance > issues you see. There have been unsubstantiated claims that VMs are > slower with Q35. At the same time, it provides a topology that looks a > lot more like a modern system than the default machine. The numbers I > reported were using the standard 440FX machine type. This is another > case where the driver in the guest may be more or less accommodating of > what we expose in the VM, so YMMV with consumer cards and different > drivers. Thanks, > > Alex > > I see. From what you were saying i got an impression that trying 440fx machine type would help. I actually managed to do it by simply defining vm on libvirt. It passed through both GPU and USB ports just fine. On boot VM properly switches to physical GPU and emulated one is disabled by windows. So far so good. However there was no performance increase whatsoever. At best it was just the same. I tried qemu with DEBUG_VFIO enabled too. After OS is done booting log gets appended only with log lines like these: > vfio: vfio_bar_read(0000:00:13.2:BAR0+0x24, 4) = 0x4000 > vfio: vfio_bar_write(0000:00:13.2:BAR0+0x28, 0x3f, 4) 00:13.2 is USB port. I also tried running dx9 complex test and did not see any more spam relating to GPU (06:00.0). I also checked vfio output for config i posted earlier (with addition of hv-time) but result is identical to libvirt 440fx test. Not sure what to make of this. I also had not really successful attempt to enable hugepages but i think it really should not be the case of 3d slowdown so i choose to ignore this failure for now. Tinkering around cpu pinning also resulted in few fps more or few less in dx9 complex benchmark so really not significant to be the real culprit here. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html