Hi everyone, I'm having a strange problem passing an mlx4 device into a kvm guest. The device in question is: 05:00.0 InfiniBand [0c06]: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] [15b3:673c] (rev b0) running the latest (I believe) FW version 2.9.1000. The host system is a fairly standard dual-socket Xeon 5600 system, perhaps a tiny bit unusual in that it is a dual Tylersburg motherboard. I'm using QEMU emulator version 1.0 (qemu-kvm-1.0 Debian 1.0+dfsg-3), Copyright (c) 2003-2008 Fabrice Bellard and Linux pure-driver3 3.1.0-1-amd64 #1 SMP Tue Jan 10 05:01:58 UTC 2012 x86_64 GNU/Linux (the latest Debian testing versions). The symptom of the problem is that when the mlx4_core driver starts, I get normal output like mlx4_core 0000:00:04.0: FW version 2.9.1000 (cmd intf rev 3), max commands 16 mlx4_core 0000:00:04.0: Catastrophic error buffer at 0x1f020, size 0x10, BAR 0 mlx4_core 0000:00:04.0: FW size 385 KB up until the driver tries to enable interrupts, when I get a long stream of Completion event for bogus CQ 00000000 and then it gives up because the NOP command interrupt test fails. Apparently what happens is that the SW2HW_EQ firmware command succeeds as far as the driver is concerned, but the EQ buffer is left as all 0s, so the driver thinks every entry is a completion event (for CQN 0). Several things are weird here: first, the command interface including DMA from the device is definitely working since we get a reasonable-looking response for the query FW command etc, so I'm not sure what is different about the SW2HW_EQ command (it is the first thing that uses the MTT I guess, so maybe there is a problem setting that up?) The guest is running 2.6.39, so there is no SR-IOV support in the mlx4 driver (but I am passing the only physical function of a non-virtualized device through, so I hope that isn't needed -- the device shouldn't know it's talking to a guest at all) Second, passing through another device on the same system: 86:00.0 Ethernet controller [0200]: Intel Corporation 82599EB 10 Gigabit TN Network Connection [8086:151c] (rev 01) works fine, including MSI-X interrupts, running traffic works, etc. Finally, the craziest thing is that this setup was working a week or so ago, but there may have been BIOS, kernel and kvm updates since then (my guest image is unchanged at least ;). Anyone have any idea what might be going on or how to debug this further? Unfortunately I don't have a PCIe analyzer handy to get a better idea of what's happening with the device... Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html