On Tue, 2013-05-28 at 03:40 +0200, Maik Broemme wrote: > Hi Alex, > > Maik Broemme <mbroemme@xxxxxxxxxxxxx> wrote: > > Hi Alex, > > > > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > > > > > Good to hear. It looks like you have the same motherboard as my AMD > > > test system. An HD7850 in that system runs quite reliably with the > > > branches above although I do occasionally get VGA palette corruption. > > > > > > > Good to know. I'm using a Radeon HD7870 which works fine now. I have the > > same VGA palette corruption occasionally but only until Catalyst driver > > is loaded. So it happens sometimes during VGA init if Windows 7 boot > > logo is shown with very strange colors and went away if Catalyst driver > > is loaded. > > > > > Are you still require -vga cirrus or do the -vga none, x-vga=on cases > > > work now too? Thanks, > > > > > > > No longer required, -vga none with x-vga=on work on your branches fine > > now. Not sure if there was something more changed because with original > > Fedora 3.9.2 kernel it still doesn't work. > > > > Alex, I have a strange issue now with either the 'vfio-vga-reset' > branches or with the stable 3.9.4 kernel. This is my 'lspci' output: > > 00:14.2 Audio device: Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA) (rev 40) > 01:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 520] (rev a1) > 01:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1) > 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Pitcairn [Radeon HD 7800] > 02:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series] > > The '01:00.0' is my primary device used for Linux and '02:00.0' my > secondary for QEMU. Two new different problems: > > 1) If the 'nvidia.ko' binary driver is loaded for the first card, QEMU > immediately get stuck after startup and hangs with: > > 1140 futex(0x7f0ad9b21300, FUTEX_WAIT_PRIVATE, 2, NULL > > I have the complete strace output if needed. After that I can only > terminate qemu with 'kill -9' and if I start it again the following > Oops occurs: > > [ 655.684121] ------------[ cut here ]------------ > [ 655.684134] WARNING: at lib/list_debug.c:29 __list_add+0x77/0xd0() > [ 655.684151] Hardware name: GA-990FXA-UD3 > [ 655.684271] list_add corruption. next->prev should be prev (ffffffff81ca3d98), but was (null). (next=ffff88041bc3fe08). > [ 655.684477] Modules linked in: vhost_net macvtap macvlan tun arc4 md4 nls_utf8 cifs dns_resolver fscache vfio_pci vfio_iommu_type1 vfio bridge stp llc ip6table_filter ip6_tables it87 hwmon_vid snd_hda_codec_hdmi nvidia(POF) acpi_cpufreq mperf kvm_amd snd_hda_codec_realtek kvm crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel snd_hda_codec microcode edac_core snd_hwdep fam15h_power snd_seq edac_mce_amd snd_seq_device k10temp r8169 sp5100_tco snd_pcm mii i2c_piix4 snd_page_alloc snd_timer i2c_core snd soundcore mxm_wmi firewire_ohci firewire_core crc_itu_t wmi > [ 655.685451] Pid: 2097, comm: qemu-system-x86 Tainted: PF O 3.9.4-200.fc18.x86_64 #1 > [ 655.685642] Call Trace: > [ 655.685738] [<ffffffff8105f125>] warn_slowpath_common+0x75/0xa0 > [ 655.685851] [<ffffffff8105f206>] warn_slowpath_fmt+0x46/0x50 > [ 655.685955] [<ffffffff81316ef7>] __list_add+0x77/0xd0 > [ 655.686058] [<ffffffff8108392c>] add_wait_queue+0x3c/0x60 > [ 655.686162] [<ffffffff813f241d>] vga_get+0xdd/0x190 > [ 655.686266] [<ffffffff81093e40>] ? try_to_wake_up+0x2d0/0x2d0 > [ 655.686373] [<ffffffffa01ac625>] vfio_pci_vga_rw+0xb5/0x230 [vfio_pci] > [ 655.686481] [<ffffffffa01aa279>] vfio_pci_rw+0x39/0x80 [vfio_pci] > [ 655.686587] [<ffffffffa01aa30c>] vfio_pci_read+0x1c/0x20 [vfio_pci] > [ 655.686701] [<ffffffffa01a40e3>] vfio_device_fops_read+0x23/0x30 [vfio] > [ 655.686814] [<ffffffff811a01b9>] vfs_read+0xa9/0x180 > [ 655.686915] [<ffffffff811a05ba>] sys_pread64+0x9a/0xb0 > [ 655.687018] [<ffffffff81669f59>] system_call_fastpath+0x16/0x1b > [ 655.687123] ---[ end trace a68eabc3660237b1 ]--- > > This is always reproducible. I know it is the binary driver and maybe > nobody cares but it is widely used. :) Hmm, so perhaps the first attempt called into the vga arbiter to get the VGA resources and the hang is because it was never able to get them. VFIO only uses the interruptible vga_get call, so you were able to kill the process, but maybe the vga arbiter didn't cleanup so well. There's not much we can do if nvidia.ko never releases the VGA resources. The VGA arbiter could do a better job with list cleanup on interruption, but it doesn't seem like it would help you run w/ nvidia.ko in the host. > 2) If the 'nouveau.ko' driver is loaded it is even more strange. As soon > as I start qemu all my SATA links get a hard reset and kernel freezes. > No SysRQs are working anymore and only reboot helps. If needed I can > look if I can get some dumps from this freeze because it writes nothing > more to the disks. This was after a host reboot, I hope? If you're using the vfio-vga-reset kernel branch then a secondary bus reset happens when the guest is started. I have seen cases on my GA-990FXA-UD3 where the bus doesn't come back from reset (possibly due to queued I/O on the bus). After reset we attempt to restore device config space which hangs with the PCI config space access lock held. This generally results in soft lockups and a mostly unusable system. Does this sound similar to what you're seeing? This is the main problem preventing me from trying to push the PCI bus reset patches upstream. > But it is getting even more strange. I was putting the secondary card > in another PCI slot and then it started to work with nouveau module > loaded and passthrough ATI card to QEMU. But this worked only until I > started X server with nouveau X driver. As soon as X is running and I > started QEMU it hanged again in FUTEX_WAIT_PRIVATE. > > 3) Without loading 'nvidia.ko' or 'nouveau.ko' modules it works out of > the box with several start/stop cycles. However I have no X in this > case. ;) > > Any ideas? :) I'd suspect that the nouveau X driver also isn't releasing the VGA resources through the VGA arbiter. That's pretty disappointing. You can check from userspace who own VGA via: sudo head /dev/vga_arbiter. Without VGA arbiter we have no coordination of legacy VGA resources between various drivers, but not all drivers support it (vgacon) and those that do apparently don't attempt to be very fair. We'll need to look into fixing Xorg on the host if this is actually the problem. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html