Re: [PATCH 1/1 V6] qemu-kvm: fix improper nmi emulation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2011-10-19 08:33, Lai Jiangshan wrote:
> On 10/19/2011 03:41 AM, Jan Kiszka wrote:
>> On 2011-10-17 18:00, Lai Jiangshan wrote:
>>> On 10/17/2011 05:49 PM, Avi Kivity wrote:
>>>> On 10/17/2011 11:40 AM, Lai Jiangshan wrote:
>>>>>>>
>>>>>>
>>>>>> LINT1 may have been programmed as a level -triggered interrupt instead
>>>>>> of edge triggered (NMI or interrupt).  We can use the ioctl argument for
>>>>>> the level (and pressing the NMI button needs to pulse the level to 1 and
>>>>>> back to 0).
>>>>>>
>>>>>
>>>>> Hi, Avi, Jan,
>>>>>
>>>>> Which approach you prefer to?
>>>>> I need to know the result before wasting too much time to respin
>>>>> the approach.
>>>>
>>>> Yes, sorry about the slow and sometimes conflicting feedback.
>>>>
>>>>> 1) Fix KVM_NMI emulation approach  (which is v3 patchset)
>>>>> 	- It directly fixes the problem and matches the
>>>>> 	  real hard ware more, but it changes KVM_NMI bahavior.
>>>>> 	- Require both kernel-site and userspace-site fix.
>>>>>
>>>>> 2) Get the LAPIC state from kernel irqchip, and inject NMI if it is allowed
>>>>>    (which is v4 patchset)
>>>>> 	- Simple, don't changes any kernel behavior.
>>>>> 	- Only need the userspace-site fix
>>>>>
>>>>> 3) Add KVM_SET_LINT1 approach (which is v5 patchset)
>>>>> 	- don't changes the kernel's KVM_NMI behavior.
>>>>> 	- much complex
>>>>> 	- Require both kernel-site and userspace-site fix.
>>>>> 	- userspace-site should also handle the !KVM_SET_LINT1
>>>>> 	  condition, it uses all the 2) approach' code. it means
>>>>> 	  this approach equals the 2) approach + KVM_SET_LINT1 ioctl.
>>>>>
>>>>> This is an urgent bug of us, we need to settle it down soo
>>>>
>>>> While (1) is simple, it overloads a single ioctl with two meanings,
>>>> that's not so good.
>>>>
>>>> Whether we do (1) or (3), we need (2) as well, for older kernels.
>>>>
>>>> So I recommend first focusing on (2) and merging it, then doing (3).
>>>>
>>>> (note an additional issue with 3 is whether to make it a vm or vcpu
>>>> ioctl - we've been assuming vcpu ioctl but it's not necessarily the best
>>>> choice).
>>>>
>>>
>>> It is the 2) approach.
>>> It only changes the user space site, the kernel site is not touched.
>>> It is changed from previous v4 patch, fixed problems found by Jan.
>>> ----------------------------
>>>
>>> From: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
>>>
>>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
>>> button event happens. This doesn't properly emulate real hardware on
>>> which NMI button event triggers LINT1. Because of this, NMI is sent to
>>> the processor even when LINT1 is maskied in LVT. For example, this
>>> causes the problem that kdump initiated by NMI sometimes doesn't work
>>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
>>>
>>> With this patch, inject-nmi request is handled as follows.
>>>
>>> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>>>   interrupt.
>>> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>>>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>>>   delivering the NMI directly. (Suggested by Jan Kiszka)
>>>
>>> Changed from old version:
>>>   re-implement it by the Jan's suggestion.
>>>   fix the race found by Jan.
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
>>> Reported-by: Kenji Kaneshige <kaneshige.kenji@xxxxxxxxxxxxxx>
>>> ---
>>>  hw/apic.c |   33 +++++++++++++++++++++++++++++++++
>>>  hw/apic.h |    1 +
>>>  monitor.c |    6 +++++-
>>>  3 files changed, 39 insertions(+), 1 deletions(-)
>>> diff --git a/hw/apic.c b/hw/apic.c
>>> index 69d6ac5..922796a 100644
>>> --- a/hw/apic.c
>>> +++ b/hw/apic.c
>>> @@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
>>>      }
>>>  }
>>>  
>>> +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
>>> +
>>> +static void kvm_irqchip_deliver_nmi(void *p)
>>> +{
>>> +    APICState *s = p;
>>> +    struct kvm_lapic_state klapic;
>>> +    uint32_t lvt;
>>> +
>>> +    kvm_get_lapic(s->cpu_env, &klapic);
>>> +    lvt = kapic_reg(&klapic, 0x32 + APIC_LVT_LINT1);
>>> +
>>> +    if (lvt & APIC_LVT_MASKED) {
>>> +        return;
>>> +    }
>>> +
>>> +    if (((lvt >> 8) & 7) != APIC_DM_NMI) {
>>> +        return;
>>> +    }
>>> +
>>> +    kvm_vcpu_ioctl(s->cpu_env, KVM_NMI);
>>> +}
>>> +
>>> +void apic_deliver_nmi(DeviceState *d)
>>> +{
>>> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
>>> +
>>> +    if (kvm_irqchip_in_kernel()) {
>>> +        run_on_cpu(s->cpu_env, kvm_irqchip_deliver_nmi, s);
>>> +    } else {
>>> +        apic_local_deliver(s, APIC_LVT_LINT1);
>>> +    }
>>> +}
>>> +
>>>  #define foreach_apic(apic, deliver_bitmask, code) \
>>>  {\
>>>      int __i, __j, __mask;\
>>> diff --git a/hw/apic.h b/hw/apic.h
>>> index c857d52..3a4be0a 100644
>>> --- a/hw/apic.h
>>> +++ b/hw/apic.h
>>> @@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
>>>                               uint8_t trigger_mode);
>>>  int apic_accept_pic_intr(DeviceState *s);
>>>  void apic_deliver_pic_intr(DeviceState *s, int level);
>>> +void apic_deliver_nmi(DeviceState *d);
>>>  int apic_get_interrupt(DeviceState *s);
>>>  void apic_reset_irq_delivered(void);
>>>  int apic_get_irq_delivered(void);
>>> diff --git a/monitor.c b/monitor.c
>>> index cb485bf..0b81f17 100644
>>> --- a/monitor.c
>>> +++ b/monitor.c
>>> @@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data)
>>>      CPUState *env;
>>>  
>>>      for (env = first_cpu; env != NULL; env = env->next_cpu) {
>>> -        cpu_interrupt(env, CPU_INTERRUPT_NMI);
>>> +        if (!env->apic_state) {
>>> +            cpu_interrupt(env, CPU_INTERRUPT_NMI);
>>> +        } else {
>>> +            apic_deliver_nmi(env->apic_state);
>>> +        }
>>>      }
>>>  
>>>      return 0;
>>
>> Looks OK to me.
>>
>> Please don't forget to bake a qemu-only patch for those bits that apply
>> to upstream as well (ie. the user space APIC path).
>>
>> Jan
>>
> 
> I did forget it.
> Did you mean we need to add "#ifdef KVM_CAP_IRQCHIP" back?

No. I meant basically your patch minus the kvm_in_kernel_irqchip code
paths, applicable against current qemu.git. Those paths will be re-added
(slightly differently) when upstream gains that support. I'm working on
a basic version an will incorporate the logic if your qemu patch is
already available.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux