[Bug 107561] 4.2 breaks PCI passthrough in QEMU/KVM

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Mon, 23 Nov 2015 18:40:09 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=107561

--- Comment #8 from schefister@xxxxxxxxx ---
I captured the traces, they take more than half a GB uncompressed. They are
available in xz under
https://drive.google.com/folderview?id=0B8ebX_WjVHnGNlN4eTEzU2xtMEk&usp=sharing

To make things clear: I have two hosts. HostB is a testing machine. VGA
passthrough with EDKII worked out of the box on 4.1 kernels. It broke with with
several 4.2 versions and also 4.3. Tried several repo versions and untouched
kernel.org versions, that was back about a month ago. Than I tested again back
a few days, and it works again with 4.2.6-301.fc23 fedora repo kernel.

HostA is the problematic one. I did the trace as requested, with 2G and 3G
guest ram. It worked with 4.1 kernels, but still doesn't work with
4.2.6-301.fc23. I can make it work with either lowering RAM to under 2.5 GB or
with my beforementioned modification (kvm_mtrr_get_guest_memory_type to always
return MTRR_TYPE_WRBACK). Of course I made the traces with unmodified kernel,
and only the 2G guest actually booted.
The exact symptom is (when I say it doesn't work), that the guest is extremly
slow. I tried booting live Linux guest, after about 15 minutes a saw messages,
but even after two hours I still couldn't get to a shell. Windows guest only
shows the white dots under the logo circling around very slowly. Once I got a
blue screen, and letters came up one-by-one, like if the error message was
written with a typewriter. Sometimes the guest just shuts down, qemu
terminates. UEFI shell, UEFI setup in guest works perfectly at all conditions,
slowness starts when booting an OS.

I can also provide the traces with working kernel version on HostA and also on
HostB, if requested for comparison.

In the shared folders you can find MTRR, PAT, and lspci -vvv info for each
host, along with traces for HostA as requested (2GB and 3GB). One of the
members on the original Arch Linux thread suggested I put a printk in the
problematic function. The dmesg files in each folder show the arguments of
vmx_get_mt_mask and what kvm_mtrr_get_guest_memory_type returns to it. 
The added line was (just before the return statement in vmx_get_mt_mask):

printk(KERN_INFO "vmx_get_mt_mask got the following: cpu=%d, vcpu=%d, gfn=%x,
MMIO=%d, cache=%x", vcpu->cpu, vcpu->vcpu_id, gfn, is_mmio, cache);

It is visible from dmesg files that in case of large guest ram, the function
doesn't even get called for vcpus other than 0. On the other hand it is called
for all in case of small memory. 
The traces are NOT from the same run as the dmesg files, as they have been
created before your post about tracing.

Please ask if any more info, traces or dumps are needed. I would be glad to
provide any help.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html