Marcelo Tosatti wrote: > On Fri, Oct 30, 2009 at 01:23:57PM +0100, Chris Lalancette wrote: >> Marcelo Tosatti wrote: >>> On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote: >>>> Marcelo Tosatti wrote: >>>>> On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote: >>>>>> This patch is the main point of the series. In order for >>>>>> kdump to properly work inside a KVM guest, we need to make >>>>>> sure that all VCPUs in virtual wire APIC mode get kicked >>>>>> to try and pick up the timer interrupts. To do this, >>>>>> we iterate over the CPUs and deliver interrupts to the >>>>>> proper VCPUs. >>>>>> >>>>>> I don't love the concept of doing kvm_irq_kick_vcpus() from >>>>>> within pit_timer_fn(). A PIT is not connected to a CPU at all, >>>>>> only to a PIC or APIC. However, if a CPU enters idle, this is >>>>>> the only way to wake it up to check for the interrupt. >>>>> The reason the PIT interrupt was fixed to BSP is: >>>>> >>>>> http://www.mail-archive.com/kvm-devel@xxxxxxxxxxxxxxxxxxxxx/msg13250.html >>>>> >>>>> Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the >>>>> destination overwrite in case its programmed by the guest to >>>>> a single CPU would fix it? >>>> Ug, nasty. I definitely don't want to re-introduce that bug. What exactly do >>>> you mean by "fix it"? Do you mean fix the original RHEL-5.1 PAE issue, or fix >>>> the kdump issue? >>> The kdump issue. Something like: >>> >>> #ifdef CONFIG_X86 >>> /* Always delivery PIT interrupt to vcpu 0 */ >>> if (irq == 0 && dest_multiple_vcpus(entry)) { >>> irqe.dest_mode = 0; /* Physical mode. */ >>> /* need to read apic_id from apic regiest since >>> * it can be rewritten */ >>> irqe.dest_id = ioapic->kvm->bsp_vcpu->vcpu_id; >>> } >>> #endif >>> >>> The rest of the code should be ready to deal with it. >> Do you have any more information about how to reproduce that original issue? I >> tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to >> install and boot it just fine with or without the "always deliver irq 0 to vcpu >> 0" check above. Either there's something in particular I need to do to trigger >> the issue (hardware or software), or it's being solved another way. > > For VMX, the guests TSC's are now usually synchronized (which was not > the case before, when this patch was written). For AMD, they start out > of sync. > > Migration also causes them to be out of sync. So i'd prefer to postpone > the removal of this limitation until there is a guarantee the TSCs > are synchronized across vcpus. OK. I'll try to get on an unsynchronized TSC machine and see if I can reproduce there. > >> In the meantime, I've gotten the "set_irq from IRQ context" that Avi suggested >> working, and the fixing up of this IOAPIC check is the last bit to actually get >> kdump working. > > The kdump fix does not require that, I believe (although it is > useful/required for other things, please send the patch separately). > > The hrtimer will fire on vcpu0, then: > > kvm_inject_pit_interrupts -> ioapic_set_irq -> apic_deliver -> > kvm_vcpu_kick(vcpuN) > > As long as the dest-id = vcpu_id overwrite change is in place. Just to follow up here, I am taking your suggestion about testing for dest_id to multiple vcpus in the IOAPIC, and it seems to be working so far. However, that is not enough to fix kdump. We also need to remove the BSP assumptions in the i8254 code, otherwise we will never even attempt to deliver the interrupt to !vcpu0. That means that we need code for Avi's suggestion about kvm_set_irq for this to all work. Attached is the patch that I'm currently testing if anyone else wants to look at it. I obviously need to break it up into a nice series, but that will have to wait until next week. My "ioapic_dest_multiple_cpus()" function could do with a "popcnt", but I couldn't find a macro after a quick look around the tree. What's the recommended way to count bits in Linux? -- Chris Lalancette
Attachment:
current.patch
Description: application/mbox