On 2011-10-17 18:00, Lai Jiangshan wrote: > On 10/17/2011 05:49 PM, Avi Kivity wrote: >> On 10/17/2011 11:40 AM, Lai Jiangshan wrote: >>>>> >>>> >>>> LINT1 may have been programmed as a level -triggered interrupt instead >>>> of edge triggered (NMI or interrupt). We can use the ioctl argument for >>>> the level (and pressing the NMI button needs to pulse the level to 1 and >>>> back to 0). >>>> >>> >>> Hi, Avi, Jan, >>> >>> Which approach you prefer to? >>> I need to know the result before wasting too much time to respin >>> the approach. >> >> Yes, sorry about the slow and sometimes conflicting feedback. >> >>> 1) Fix KVM_NMI emulation approach (which is v3 patchset) >>> - It directly fixes the problem and matches the >>> real hard ware more, but it changes KVM_NMI bahavior. >>> - Require both kernel-site and userspace-site fix. >>> >>> 2) Get the LAPIC state from kernel irqchip, and inject NMI if it is allowed >>> (which is v4 patchset) >>> - Simple, don't changes any kernel behavior. >>> - Only need the userspace-site fix >>> >>> 3) Add KVM_SET_LINT1 approach (which is v5 patchset) >>> - don't changes the kernel's KVM_NMI behavior. >>> - much complex >>> - Require both kernel-site and userspace-site fix. >>> - userspace-site should also handle the !KVM_SET_LINT1 >>> condition, it uses all the 2) approach' code. it means >>> this approach equals the 2) approach + KVM_SET_LINT1 ioctl. >>> >>> This is an urgent bug of us, we need to settle it down soo >> >> While (1) is simple, it overloads a single ioctl with two meanings, >> that's not so good. >> >> Whether we do (1) or (3), we need (2) as well, for older kernels. >> >> So I recommend first focusing on (2) and merging it, then doing (3). >> >> (note an additional issue with 3 is whether to make it a vm or vcpu >> ioctl - we've been assuming vcpu ioctl but it's not necessarily the best >> choice). >> > > It is the 2) approach. > It only changes the user space site, the kernel site is not touched. > It is changed from previous v4 patch, fixed problems found by Jan. > ---------------------------- > > From: Lai Jiangshan <laijs@xxxxxxxxxxxxxx> > > Currently, NMI interrupt is blindly sent to all the vCPUs when NMI > button event happens. This doesn't properly emulate real hardware on > which NMI button event triggers LINT1. Because of this, NMI is sent to > the processor even when LINT1 is maskied in LVT. For example, this > causes the problem that kdump initiated by NMI sometimes doesn't work > on KVM, because kdump assumes NMI is masked on CPUs other than CPU0. > > With this patch, inject-nmi request is handled as follows. > > - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI > interrupt. > - When in-kernel irqchip is enabled, get the in-kernel LAPIC states > and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then > delivering the NMI directly. (Suggested by Jan Kiszka) > > Changed from old version: > re-implement it by the Jan's suggestion. > fix the race found by Jan. > > Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx> > Reported-by: Kenji Kaneshige <kaneshige.kenji@xxxxxxxxxxxxxx> > --- > hw/apic.c | 33 +++++++++++++++++++++++++++++++++ > hw/apic.h | 1 + > monitor.c | 6 +++++- > 3 files changed, 39 insertions(+), 1 deletions(-) > diff --git a/hw/apic.c b/hw/apic.c > index 69d6ac5..922796a 100644 > --- a/hw/apic.c > +++ b/hw/apic.c > @@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level) > } > } > > +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id); > + > +static void kvm_irqchip_deliver_nmi(void *p) > +{ > + APICState *s = p; > + struct kvm_lapic_state klapic; > + uint32_t lvt; > + > + kvm_get_lapic(s->cpu_env, &klapic); > + lvt = kapic_reg(&klapic, 0x32 + APIC_LVT_LINT1); > + > + if (lvt & APIC_LVT_MASKED) { > + return; > + } > + > + if (((lvt >> 8) & 7) != APIC_DM_NMI) { > + return; > + } > + > + kvm_vcpu_ioctl(s->cpu_env, KVM_NMI); > +} > + > +void apic_deliver_nmi(DeviceState *d) > +{ > + APICState *s = DO_UPCAST(APICState, busdev.qdev, d); > + > + if (kvm_irqchip_in_kernel()) { > + run_on_cpu(s->cpu_env, kvm_irqchip_deliver_nmi, s); > + } else { > + apic_local_deliver(s, APIC_LVT_LINT1); > + } > +} > + > #define foreach_apic(apic, deliver_bitmask, code) \ > {\ > int __i, __j, __mask;\ > diff --git a/hw/apic.h b/hw/apic.h > index c857d52..3a4be0a 100644 > --- a/hw/apic.h > +++ b/hw/apic.h > @@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, > uint8_t trigger_mode); > int apic_accept_pic_intr(DeviceState *s); > void apic_deliver_pic_intr(DeviceState *s, int level); > +void apic_deliver_nmi(DeviceState *d); > int apic_get_interrupt(DeviceState *s); > void apic_reset_irq_delivered(void); > int apic_get_irq_delivered(void); > diff --git a/monitor.c b/monitor.c > index cb485bf..0b81f17 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) > CPUState *env; > > for (env = first_cpu; env != NULL; env = env->next_cpu) { > - cpu_interrupt(env, CPU_INTERRUPT_NMI); > + if (!env->apic_state) { > + cpu_interrupt(env, CPU_INTERRUPT_NMI); > + } else { > + apic_deliver_nmi(env->apic_state); > + } > } > > return 0; Looks OK to me. Please don't forget to bake a qemu-only patch for those bits that apply to upstream as well (ie. the user space APIC path). Jan
Attachment:
signature.asc
Description: OpenPGP digital signature