RE: [PATCH] KVM: x86: ioapic: Clear IRR for rtc bit when rtc EOI gotten

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Liran Alon [mailto:LIRAN.ALON@xxxxxxxxxx]
> Sent: Thursday, December 14, 2017 9:02 PM
> To: Gonglei (Arei); pbonzini@xxxxxxxxxx; rkrcmar@xxxxxxxxxx
> Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Huangweidong (C)
> Subject: Re: [PATCH] KVM: x86: ioapic: Clear IRR for rtc bit when rtc EOI gotten
> 
> 
> 
> On 14/12/17 14:23, Gonglei wrote:
> > We hit a bug in our test while run PCMark 10 in a windows 7 VM,
> > The VM got stuck and the wallclock was hang after several minutes running
> > PCMark 10 in it.
> > It is quite easily to reproduce the bug with the upstream KVM and Qemu.
> >
> > We found that KVM can not inject any RTC irq to VM after it was hang, it fails
> to
> > Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr.
> >
> > static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
> >                    int irq_level, bool line_status)
> > {
> > ...
> >           if (!irq_level) {
> >                    ioapic->irr &= ~mask;
> >                    ret = 1;
> >                    goto out;
> >           }
> > ...
> >           if ((edge && old_irr == ioapic->irr) ||
> >               (!edge && entry.fields.remote_irr)) {
> >                    ret = 0;
> >                    goto out;
> >           }
> >
> > According to RTC spec, after RTC injects a High level irq, OS will read CMOS's
> > register C to to clear the irq flag, and pull down the irq electric pin.
> >
> > For Qemu, we will emulate the reading operation in cmos_ioport_read(),
> > but Guest OS will fire a write operation before to tell which register will be
> read
> > after this write, where we use s->cmos_index to record the following register
> to read.
> >
> > But in our test, we found that there is a possible situation that Vcpu fails to
> read
> > RTC_REG_C to clear irq, This could happens while two VCpus are
> writing/reading
> > registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C,
> > so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C,
> > but before it tries to read register C, another vcpu1 is going to read
> RTC_YEAR,
> > it changes s->cmos_index to RTC_YEAR by a writing action.
> > The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we
> will miss
> > calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never inject
> RTC irq,
> > and Windows VM will hang.
> 
> If I understood correctly, this looks to me like a race-condition bug in
> the Windows guest kernel. In real-hardware this race-condition will also
> cause the RTC_YEAR to be read instead of RTC_REG_C.
> Guest kernel should make sure that 2 CPUs does not attempt to read a
> CMOS register in parallel as they can override each other's cmos_index.
> 
> See for example how Linux kernel makes sure to avoid such kind of issues
> in rtc_cmos_read() (arch/x86/kernel/rtc.c) by grabbing a cmos_lock.
> 
Yes, I knew that. 

> >
> > Let's clear IRR of rtc when corresponding EOI is gotten to avoid the issue.
> 
> Can you elaborate a bit more why it makes sense to put such workaround
> in KVM code instead of declaring this as guest kernel bug?
> 
I agree it's a Windows bug. The big problem is there is not problem on Xen platform.

Thanks,
-Gonglei

> Regards,
> -Liran
> 
> >
> > Suggested-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > Signed-off-by: Gonglei <arei.gonglei@xxxxxxxxxx>
> > ---
> >    Thanks to Paolo provides a good solution. :)
> >
> >   arch/x86/kvm/ioapic.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> > index 4e822ad..5022d63 100644
> > --- a/arch/x86/kvm/ioapic.c
> > +++ b/arch/x86/kvm/ioapic.c
> > @@ -160,6 +160,7 @@ static void rtc_irq_eoi(struct kvm_ioapic *ioapic,
> struct kvm_vcpu *vcpu)
> >   {
> >   	if (test_and_clear_bit(vcpu->vcpu_id,
> >   			       ioapic->rtc_status.dest_map.map)) {
> > +		ioapic->irr &= ~(1 << RTC_GSI);
> >   		--ioapic->rtc_status.pending_eoi;
> >   		rtc_status_pending_eoi_check_valid(ioapic);
> >   	}
> >



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux