On Fri, Oct 30, 2009 at 04:28:01PM +0100, Chris Lalancette wrote: > Marcelo Tosatti wrote: > > On Fri, Oct 30, 2009 at 01:23:57PM +0100, Chris Lalancette wrote: > >> Marcelo Tosatti wrote: > >>> On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote: > >>>> Marcelo Tosatti wrote: > >>>>> On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote: > >>>>>> This patch is the main point of the series. In order for > >>>>>> kdump to properly work inside a KVM guest, we need to make > >>>>>> sure that all VCPUs in virtual wire APIC mode get kicked > >>>>>> to try and pick up the timer interrupts. To do this, > >>>>>> we iterate over the CPUs and deliver interrupts to the > >>>>>> proper VCPUs. > >>>>>> > >>>>>> I don't love the concept of doing kvm_irq_kick_vcpus() from > >>>>>> within pit_timer_fn(). A PIT is not connected to a CPU at all, > >>>>>> only to a PIC or APIC. However, if a CPU enters idle, this is > >>>>>> the only way to wake it up to check for the interrupt. > >>>>> The reason the PIT interrupt was fixed to BSP is: > >>>>> > >>>>> http://www.mail-archive.com/kvm-devel@xxxxxxxxxxxxxxxxxxxxx/msg13250.html > >>>>> > >>>>> Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the > >>>>> destination overwrite in case its programmed by the guest to > >>>>> a single CPU would fix it? > >>>> Ug, nasty. I definitely don't want to re-introduce that bug. What exactly do > >>>> you mean by "fix it"? Do you mean fix the original RHEL-5.1 PAE issue, or fix > >>>> the kdump issue? > >>> The kdump issue. Something like: > >>> > >>> #ifdef CONFIG_X86 > >>> /* Always delivery PIT interrupt to vcpu 0 */ > >>> if (irq == 0 && dest_multiple_vcpus(entry)) { > >>> irqe.dest_mode = 0; /* Physical mode. */ > >>> /* need to read apic_id from apic regiest since > >>> * it can be rewritten */ > >>> irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id; > >>> } > >>> #endif > >>> > >>> The rest of the code should be ready to deal with it. > >> Do you have any more information about how to reproduce that original issue? I > >> tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to > >> install and boot it just fine with or without the "always deliver irq 0 to vcpu > >> 0" check above. Either there's something in particular I need to do to trigger > >> the issue (hardware or software), or it's being solved another way. > > > > For VMX, the guests TSC's are now usually synchronized (which was not > > the case before, when this patch was written). For AMD, they start out > > of sync. > > > > Migration also causes them to be out of sync. So i'd prefer to postpone > > the removal of this limitation until there is a guarantee the TSCs > > are synchronized across vcpus. > > OK. I'll try to get on an unsynchronized TSC machine and see if I can reproduce > there. > > > > >> In the meantime, I've gotten the "set_irq from IRQ context" that Avi suggested > >> working, and the fixing up of this IOAPIC check is the last bit to actually get > >> kdump working. > > > > The kdump fix does not require that, I believe (although it is > > useful/required for other things, please send the patch separately). > > > > The hrtimer will fire on vcpu0, then: > > > > kvm_inject_pit_interrupts -> ioapic_set_irq -> apic_deliver -> > > kvm_vcpu_kick(vcpuN) > > > > As long as the dest-id = vcpu_id overwrite change is in place. > > Just to follow up here, I am taking your suggestion about testing for dest_id to > multiple vcpus in the IOAPIC, and it seems to be working so far. However, that > is not enough to fix kdump. We also need to remove the BSP assumptions in the > i8254 code, otherwise we will never even attempt to deliver the interrupt to > !vcpu0. But the limitation is only in ioapic_deliver. vcpu0 will process the hrtimer interrupt on its entry path (through __inject_pit_timer_intr), call into the ioapic code via kvm_set_irq, and follow the normal apic interrupt delivery path (which includes waking up the target vcpu(s)). > That means that we need code for Avi's suggestion about kvm_set_irq for > this to all work. What happens now is the kvm_set_irq work is "scheduled" to execute in vcpu0 entry context. Regarding KVM_REQ_PENDING_TIMER, it is implicitly used in vcpu_enter_guest here: if (vcpu->requests || need_resched() || signal_pending(current)) > Attached is the patch that I'm currently testing if anyone else wants to look at > it. I obviously need to break it up into a nice series, but that will have to > wait until next week. My "ioapic_dest_multiple_cpus()" function could do with a > "popcnt", but I couldn't find a macro after a quick look around the tree. > What's the recommended way to count bits in Linux? There's a hweight_long helper. Hum, suppose its safe to limit the irq0->bsp override to ioapic delivery mode lowest priority (so you don't need to compare IOAPIC destination id with the local apic ID's). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html