Avi Kivity wrote: > On 10/27/2009 06:41 PM, Chris Lalancette wrote: >> This patch series aims to get kdump working inside a KVM guest. >> The current problem with using kdump is that KVM always delivers >> PIT interrupts to the BSP, and the BSP only. While this is >> technically allowed by the MPS spec, most motherboards actually >> deliver timer interrupts to *any* LAPIC in virtual wire mode. >> Since a crash can occur on any CPU, timer interrupts must >> be able to reach any CPU in order for kdump to work properly. >> >> Therefore, this patch series kicks all of the relevant vCPUs >> when delivering a timer interrupt. With these patches in >> place, kdump in a RHEL-5 guest works properly. >> > > This is pretty expensive on large guests. However I suppose under > normal conditions we won't be in virtual wire mode? Right, exactly. When running in normal SMP mode, your LAPIC's are in "Symmetric I/O Mode", meaning they won't be kicked at all. It's possible in theory for a guest OS to program all of it's LAPIC's in virtual wire mode, but that is a decidedly non-standard setup and not one recommended in any of the MPS documentation that I've read. > > The kick from i8254 code is pretty bad, as you mention. I forget why it > is needed at all - shouldn't kvm_set_irq() end up kicking the correct As I understand it, that's not quite how it works. From what I can see, what happens is that the i8254 is programmed as an hrtimer. When the hrtimer expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code). That code is running in interrupt context, however, so you can't directly call "set_irq" at that point. Instead, we update the "pending" variable and defer work until later on. That "later on" is when we are doing a vcpu_run, at which point we check the "pending" variable, and if set, inject the interrupt. The problem is that if the vcpu(s) are in idle when the hrtimer expires, and we don't kick them, no vcpu will wake up, and hence none of them will ever run "set_irq" to get it injected into the guest. If you have other ideas on how we might accomplish this, I'd definitely be interested in hearing them. > vcpu (and note the pic still insists on the bsp:) > > /* > * callback when PIC0 irq status changed > */ > static void pic_irq_request(void *opaque, int level) > { > struct kvm *kvm = opaque; > struct kvm_vcpu *vcpu = kvm->bsp_vcpu; > struct kvm_pic *s = pic_irqchip(kvm); > int irq = pic_get_irq(&s->pics[0]); > > s->output = level; > if (vcpu && level && (s->pics[0].isr_ack & (1 << irq))) { > s->pics[0].isr_ack &= ~(1 << irq); > kvm_vcpu_kick(vcpu); > } > } > Yes, I looked at this too. It's another one we could fix by changing "kvm_vcpu_kick()" to "kvm_irq_kick_vcpus()". However, it's not exactly required by kdump since the linux kernel prefers to use the IOAPIC where possible. -- Chris Lalancette -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html