On May 21, 2008, at 6:07 PM, Xu, Anthony wrote:
Avi Kivity wrote:Xu, Anthony wrote:Xiantao and I have found the root cause, Qemu emulates PIIX chipset, all pci devices can only use irq 10.11, which is confiured inside chipset interrupt routing table. Even though IOAPIC have 24 interrupt pins. While KVM/IA64 use the same Guest Firmware with what XEN/IA64 which use different "interrupt routing algorithm". Means the pci device irq doesn't match between qemu and Guest Firmware in KVM/IA64. So guest didn't get pci device interrupt. Obviously there are two ways to fix it. 1. modify qemu side, all pci devices use irq larger than or equal to 16, we need to come out an algorithm to calculate irq from pci device(bus number,device number, function number), then we also need to modify IA32 Guest BIOS to present the same pci device irq (use same algorithm) to guest OS. Avi seems not want to modify qemu a lot. 2. modify IA64 guest firmware, two pros, (1)all pci devices use only 10,11 two irqs, so if there are many pci device, there are a lot of interrupt sharing, which impact performance negatively (2) We need to maintain two versions fo IA64 guest firmwares, one for KVM/IA64, the other for XEN/IA64, which is not what I want. What's your suggestion?Allowing qemu to use all ioapic interrupt pins will reduce interrupt sharing on x86, which is a good thing, so I prefer the first option too.Thanks for your support, I preper option #1, Any suggestion for the mapping from BDF to irq. In XEN both in IA64/IA32,BIOS provides a 48 pin IOAPIC ( usually it is 24) to reduce irq sharing.
Most mainboards these days provide two IOAPICs, which would sum up to 48 again. I think that should be the preferred way of implementing it virtually too.
0~15 are reserved for legacy devices.
This is because the old PIC controllers handled up to IRQ16
Pci devices use 16~47,
IIRC on most real machines Pins 16-20 are used for LNKA to LNKD.
The mapping is like ((bdf >> 3) *4) %(48-16) + 16 Means every pci interrup pin( irqA, irqB, irqC, irqD) of every pcidevice use different irq pin of IOAPIC if number of pci devices is lessthan 8. I think it can avoid interrupt sharing in most case. If use this method, we can share same IA64 guest BIOS between XEN/IA64 and KVM/IA64.
I'm not sure if I'm too fond of that method. It does not look too compliant with how PCs work these days. You might want to use that formula on the second IOAPIC only, so all PCI devices get routed to pins 25-48. Remember that you still have to provide "legacy boot interrupts" that map these to the first IOAPIC for Operating Systems that don't know how to handle high pin interrupts.
I am not sure which OSs that would be, but I'm pretty sure all the PCIe PCI-bridge vendors didn't implement that feature for nothing.
What do you think?
The idea is great! I tried extending the IRQ logic to a "full IOAPIC" myself recently, but failed miserably. The biggest hurdle is that currently the code is reversed in qemu. If an interrupt occurs, the PIC is asked if it's destined to go there and if not it gets rerouted to the IOAPIC. Unfortunately this breaks with IRQs > 16.
I'll attach a small C program we developed internally to read out the IOAPIC from within Linux. You could try to run that on your machine to see how your IOAPIC is configured. One more good idea would be to get yourself a machine with PCI Express cards. Those handle IRQs pretty much the way you want them to be.
Regards, Alex
Attachment:
apicdump.c
Description: Binary data
Thanks, Anthony -- To unsubscribe from this list: send the line "unsubscribe kvm-ia64" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html