On Sun, 2009-05-03 at 21:36 -0700, Nicholas A. Bellinger wrote: > On Mon, 2009-05-04 at 10:09 +0800, Sheng Yang wrote: > > On Monday 04 May 2009 08:53:07 Nicholas A. Bellinger wrote: > > > On Sat, 2009-05-02 at 18:22 +0800, Sheng Yang wrote: > > > > On Thu, Apr 30, 2009 at 01:22:54PM -0700, Nicholas A. Bellinger wrote: > > > > > Greetings KVM folks, > > > > > > > > > > I wondering if any information exists for doing SR-IOV on the new VT-d > > > > > capable chipsets with KVM..? From what I understand the patches for > > > > > doing this with KVM are floating around, but I have been unable to find > > > > > any user-level docs for actually making it all go against a upstream > > > > > v2.6.30-rc3 code.. > > > > > > > > > > So far I have been doing IOV testing with Xen 3.3 and 3.4.0-pre, and I > > > > > am really hoping to be able to jump to KVM for single-function and and > > > > > then multi-function SR-IOV. I know that the VM migration stuff for IOV > > > > > in Xen is up and running, and I assume it is being worked in for KVM > > > > > instance migration as well..? This part is less important (at least > > > > > for me :-) than getting a stable SR-IOV setup running under the KVM > > > > > hypervisor.. Does anyone have any pointers for this..? > > > > > > > > > > Any comments or suggestions are appreciated! > > > > > > > > Hi Nicholas > > > > > > > > The patches are not floating around now. As you know, SR-IOV for Linux > > > > have been in 2.6.30, so then you can use upstream KVM and qemu-kvm(or > > > > recent released kvm-85) with 2.6.30-rc3 as host kernel. And some time > > > > ago, there are several SRIOV related patches for qemu-kvm, and now they > > > > all have been checked in. > > > > > > > > And for KVM, the extra document is not necessary, for you can simple > > > > assign a VF to guest like any other devices. And how to create VF is > > > > specific for each device driver. So just create a VF then assign it to > > > > KVM guest is fine. > > > > > > Greetings Sheng, > > > > > > So, I have been trying the latest kvm-85 release on a v2.6.30-rc3 > > > checkout from linux-2.6.git on a CentOS 5u3 x86_64 install on Intel > > > IOH-5520 based dual socket Nehalem board. I have enabled DMAR and > > > Interrupt Remapping my KVM host using v2.6.30-rc3 and from what I can > > > tell, the KVM_CAP_* defines from libkvm are enabled with building kvm-85 > > > after './configure --kerneldir=/usr/src/linux-2.6.git' and the PCI > > > passthrough code is being enabled in kvm-85/qemu/hw/device-assignment.c > > > AFAICT.. > > > > > > >From there, I use the freshly installed qemu-x86_64-system binary to > > > > > > start a Debian 5 x86_64 HVM (that previously had been moving network > > > packets under Xen for PCIe passthrough). I see the MSI-X interrupt > > > remapping working on the KVM host for the passed -pcidevice, and the > > > MMIO mappings from the qemu build that I also saw while using > > > Xen/qemu-dm built with PCI passthrough are there as well.. > > > > > > > Hi Nicholas > > > > > But while the KVM guest is booting, I see the following exception(s) > > > from qemu-x86_64-system for one of the VFs for a multi-function PCIe > > > device: > > > > > > BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1) > > > > This one is mostly harmless. > > > > > Ok, good to know.. :-) > > > > I try with one of the on-board e1000e ports (02:00.0) and I see the same > > > exception along with some MSI-X exceptions from qemu-x86_64-system in > > > KVM guest.. However, I am still able to see the e1000e and the other > > > vxge multi-function device with lspci, but I am unable to dhcp or ping > > > with the e1000e and VF from multi-function device fails to register the > > > MSI-X interrupt in the guest.. > > > > Did you see the interrupt in the guest and host side? > > Ok, I am restarting the e1000e test with a fresh Fedora 11 install and > KVM host kernel 2.6.29.1-111.fc11.x86_64. After unbinding and > attaching the e1000e single-function device at 02:00.0 to pci-stub with: > > echo "8086 10d3" > /sys/bus/pci/drivers/pci-stub/new_id > echo 0000:02:00.0 > /sys/bus/pci/devices/0000:02:00.0/driver/unbind > echo 0000:02:00.0 > /sys/bus/pci/drivers/pci-stub/bind > > I see the following the KVM host kernel ring buffer: > > e1000e 0000:02:00.0: PCI INT A disabled > pci-stub 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 > pci-stub 0000:02:00.0: irq 58 for MSI/MSI-X > > > I think you can try on- > > board e1000e for MSI-X first. And please ensure correlated driver have been > > loaded correctly. > > <nod>.. > > > And what do you mean by "some MSI-X exceptions"? Better with > > the log. > > Ok, with the Fedora 11 installed qemu-kemu, I see the expected > kvm_destroy_phys_mem() statements: > > #kvm-host qemu-kvm -m 2048 -smp 8 -pcidevice host=02:00.0 lenny64guest1-orig.img > BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1) > BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1) > > However I still see the following in the KVM guest kernel ring buffer > running v2.6.30-rc in the HVM guest. > > [ 5.523790] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10 > [ 5.524582] e1000e 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10 > [ 5.525710] e1000e 0000:00:05.0: setting latency timer to 64 > [ 5.526048] 0000:00:05.0: 0000:00:05.0: Failed to initialize MSI-X interrupts. Falling back to MSI interrupts. > [ 5.527200] 0000:00:05.0: 0000:00:05.0: Failed to initialize MSI interrupts. Falling back to legacy interrupts. > [ 5.829988] 0000:00:05.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:e0:81:c0:90:b2 > [ 5.830672] 0000:00:05.0: eth0: Intel(R) PRO/1000 Network Connection > [ 5.831240] 0000:00:05.0: eth0: MAC: 3, PHY: 8, PBA No: ffffff-0ff > > While doing dhcp, the e1000e throws a netdev watchdog transmit timeout.. > > Here is what lspci -v -s 00:05.0 looks like: > > 00:05.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection > Subsystem: Intel Corporation Device 0000 > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 10 > Region 0: Memory at f2020000 (32-bit, non-prefetchable) [size=128K] > Region 2: I/O ports at c220 [size=32] > Region 3: Memory at f2040000 (32-bit, non-prefetchable) [size=16K] > Kernel driver in use: e1000e > Kernel modules: e1000e > Hi Sheng, Btw, this is what it looks like in KVM HVM guest running v2.6.30-rc3 after plugging in the port and dhcp occuring.. The KVM HVM does not hard lock (cool :-), and I am still able to access via the built-in qemu net-device. Here are my .config options for the v2.6.30-rc3 KVM guest running on top of 2.6.26.6-79.fc9.x86_64 Fedora 11 Preview KVM Host. I am missing something in the v2.6.30-rc3 KVM guest config for accessing an e1000e port SR-IOV below..? Many thanks for your most valuable of time, --nab # # Bus options (PCI etc.) # CONFIG_PCI=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y CONFIG_PCI_DOMAINS=y # CONFIG_DMAR is not set # CONFIG_INTR_REMAP is not set CONFIG_PCIEPORTBUS=y CONFIG_HOTPLUG_PCI_PCIE=m CONFIG_PCIEAER=y # CONFIG_PCIEASPM is not set CONFIG_ARCH_SUPPORTS_MSI=y CONFIG_PCI_MSI=y CONFIG_PCI_LEGACY=y # CONFIG_PCI_DEBUG is not set # CONFIG_PCI_STUB is not set # CONFIG_HT_IRQ is not set # CONFIG_PCI_IOV is not set CONFIG_ISA_DMA_API=y CONFIG_K8_NB=y # CONFIG_PCCARD is not set CONFIG_HOTPLUG_PCI=m CONFIG_HOTPLUG_PCI_FAKE=m CONFIG_HOTPLUG_PCI_ACPI=m CONFIG_HOTPLUG_PCI_ACPI_IBM=m CONFIG_HOTPLUG_PCI_CPCI=y CONFIG_HOTPLUG_PCI_CPCI_ZT5550=m CONFIG_HOTPLUG_PCI_CPCI_GENERIC=m CONFIG_HOTPLUG_PCI_SHPC=m [ 17.476125] eth0: link up, 100Mbps, full-duplex, lpa 0x05E1 [ 19.969922] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 30.250140] NET: Registered protocol family 10 [ 30.251561] lo: Disabled Privacy Extensions [ 32.942145] 0000:00:05.0: eth1: Detected Tx Unit Hang: [ 32.942147] TDH <1> [ 32.942148] TDT <4> [ 32.942149] next_to_use <4> [ 32.942149] next_to_clean <0> [ 32.942150] buffer_info[next_to_clean]: [ 32.942151] time_stamp <fffef895> [ 32.942152] next_to_watch <0> [ 32.942153] jiffies <fffefb33> [ 32.942154] next_to_watch.status <0> [ 34.804645] 0000:00:05.0: eth1: Detected Tx Unit Hang: [ 34.804647] TDH <1> [ 34.804648] TDT <4> [ 34.804649] next_to_use <4> [ 34.804650] next_to_clean <0> [ 34.804651] buffer_info[next_to_clean]: [ 34.804652] time_stamp <fffef895> [ 34.804653] next_to_watch <0> [ 34.804654] jiffies <fffefd05> [ 34.804655] next_to_watch.status <0> [ 36.804621] 0000:00:05.0: eth1: Detected Tx Unit Hang: [ 36.804623] TDH <1> [ 36.804624] TDT <4> [ 36.804625] next_to_use <4> [ 36.804625] next_to_clean <0> [ 36.804626] buffer_info[next_to_clean]: [ 36.804627] time_stamp <fffef895> [ 36.804628] next_to_watch <0> [ 36.804629] jiffies <fffefef9> [ 36.804630] next_to_watch.status <0> [ 38.804577] 0000:00:05.0: eth1: Detected Tx Unit Hang: [ 38.804579] TDH <1> [ 38.804580] TDT <4> [ 38.804581] next_to_use <4> [ 38.804591] next_to_clean <0> [ 38.804592] buffer_info[next_to_clean]: [ 38.804593] time_stamp <fffef895> [ 38.804594] next_to_watch <0> [ 38.804595] jiffies <ffff00ed> [ 38.804596] next_to_watch.status <0> [ 39.804214] ------------[ cut here ]------------ [ 39.804827] WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x11b/0x1bd() [ 39.805820] Hardware name: [ 39.806356] NETDEV WATCHDOG: eth1 (e1000e): transmit timed out [ 39.807003] Modules linked in: ipv6 loop serio_raw virtio_balloon pcspkr psmouse parport_pc button parport i2c_piix4 i2c_core processor evdev ext3 jbd mbcache ide_cd_mod cdrom ide_gd_mod ata_piix ata_generic libata scsi_mod virtio_pci virtio_ring virtio piix 8139cp ide_pci_generic 8139too e1000e ide_core mii floppy thermal fan thermal_sys [ 39.816257] Pid: 0, comm: swapper Not tainted 2.6.30-rc3 #7 [ 39.816911] Call Trace: [ 39.817458] <IRQ> [<ffffffff80238caa>] ? warn_slowpath+0xd8/0x10a [ 39.818392] [<ffffffff80343900>] ? cpumask_any_but+0x28/0x34 [ 39.819036] [<ffffffff80231aa1>] ? find_busiest_group+0x2dc/0x942 [ 39.819697] [<ffffffff8022d661>] ? enqueue_task_fair+0x24/0x6a [ 39.820436] [<ffffffff8022aec9>] ? enqueue_task+0x5c/0x65 [ 39.821118] [<ffffffff8022aec9>] ? enqueue_task+0x5c/0x65 [ 39.821762] [<ffffffff8022afb9>] ? activate_task+0x20/0x26 [ 39.822321] [<ffffffff8023273b>] ? try_to_wake_up+0x212/0x224 [ 39.822792] [<ffffffff8024af2f>] ? autoremove_wake_function+0x9/0x2e [ 39.823273] [<ffffffff80411a55>] ? dev_watchdog+0x11b/0x1bd [ 39.823723] [<ffffffff8022bffa>] ? __wake_up+0x30/0x44 [ 40.033557] [<ffffffff8041193a>] ? dev_watchdog+0x0/0x1bd [ 40.034090] [<ffffffff80241214>] ? run_timer_softirq+0x18c/0x202 [ 40.034794] [<ffffffff80251b82>] ? getnstimeofday+0x59/0xb3 [ 40.035449] [<ffffffff8023d772>] ? __do_softirq+0xa6/0x168 [ 40.036174] [<ffffffff8020ca7c>] ? call_softirq+0x1c/0x28 [ 40.036861] [<ffffffff8020e254>] ? do_softirq+0x2c/0x6c [ 40.037495] [<ffffffff8023d474>] ? irq_exit+0x3f/0x7c [ 40.038125] [<ffffffff8021be42>] ? smp_apic_timer_interrupt+0x87/0x94 [ 40.038795] [<ffffffff8020c493>] ? apic_timer_interrupt+0x13/0x20 [ 40.039462] <EOI> [<ffffffff8021235c>] ? default_idle+0x5b/0x99 [ 40.040370] [<ffffffff8024e461>] ? notifier_call_chain+0x29/0x4c [ 40.041105] [<ffffffff8020ad55>] ? cpu_idle+0x4a/0x8b [ 40.041723] ---[ end trace dc792b53566c049e ]--- [ 40.484820] eth0: no IPv6 routers present [ 40.712073] eth1: no IPv6 routers present [ 43.489776] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html