Re: [Qemu-devel] [PATCH 1/2] KVM: fix i8259 interrupt high to low transition logic

Matthew Ogilvie <mmogilvi_qemu@xxxxxxxxxxxx> · Wed, 12 Sep 2012 23:49:50 -0600

On Wed, Sep 12, 2012 at 10:57:57AM +0200, Jan Kiszka wrote:
> On 2012-09-12 10:51, Avi Kivity wrote:
> > On 09/12/2012 11:48 AM, Jan Kiszka wrote:
> >> On 2012-09-12 10:01, Avi Kivity wrote:
> >>> On 09/10/2012 04:29 AM, Matthew Ogilvie wrote:
> >>>> Intel's definition of "edge triggered" means: "asserted with a
> >>>> low-to-high transition at the time an interrupt is registered
> >>>> and then kept high until the interrupt is served via one of the
> >>>> EOI mechanisms or goes away unhandled."
> >>>>
> >>>> So the only difference between edge triggered and level triggered
> >>>> is in the leading edge, with no difference in the trailing edge.
> >>>>
> >>>> This bug manifested itself when the guest was Microport UNIX
> >>>> System V/386 v2.1 (ca. 1987), because it would sometimes mask
> >>>> off IRQ14 in the slave IMR after it had already been asserted.
> >>>> The master would still try to deliver an interrupt even though
> >>>> IRQ2 had dropped again, resulting in a spurious interupt
> >>>> (IRQ15) and a panicked UNIX kernel.
> >>>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> >>>> index adba28f..5cbba99 100644
> >>>> --- a/arch/x86/kvm/i8254.c
> >>>> +++ b/arch/x86/kvm/i8254.c
> >>>> @@ -302,8 +302,12 @@ static void pit_do_work(struct kthread_work *work)
> >>>>  	}
> >>>>  	spin_unlock(&ps->inject_lock);
> >>>>  	if (inject) {
> >>>> -		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> >>>> +		/* Clear previous interrupt, then create a rising
> >>>> +		 * edge to request another interupt, and leave it at
> >>>> +		 * level=1 until time to inject another one.
> >>>> +		 */
> >>>>  		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
> >>>> +		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> >>>>  
> >>>>  		/*
> >>>
> >>> I thought I understood this, now I'm not sure.  How can this be correct?
> >>>  Real hardware doesn't act like this.
> >>>
> >>> What if the PIT is disabled after this?  You're injecting a spurious
> >>> interrupt then.
> >>
> >> Yes, the PIT has to raise the output as long as specified, i.e.
> >> according to the datasheet. That's important now due to the corrections
> >> to the PIC. We can then carefully check if there is room for
> >> simplifications / optimizations. I also cannot imagine that the above
> >> already fulfills these requirements.
> >>
> >> And if the PIT is disabled by the HPET, we need to clear the output
> >> explicitly as we inject the IRQ#0 under a different source ID than
> >> userspace HPET does (which will logically take over IRQ#0 control). The
> >> kernel would otherwise OR both sources to an incorrect result.
> >>
> > 
> > I guess we need to double the hrtimer rate then in order to generate a
> > square wave.  It's getting ridiculous how accurate our model needs to be.
> 
> I would suggest to solve this for the userspace model first, ensure that
> it works properly in all modes, maybe optimize it, and then decide how
> to map all this on kernel space. As long as we have two models, we can
> also make use of them.

Thoughts about the 8254 PIT:

First, this summary of (real) 8254 PIT behavior seems fairly
good, as far it goes:

On Tue, Sep 04, 2012 at 07:27:38PM +0100, Maciej W. Rozycki wrote:
>  * The 8254 PIT is normally configured in mode 2 or 3 in the PC/AT
>    architecture.  In the former its output is high (active) all the time
>    except from one (last) clock cycle.  In the latter a wave that has a
>    duty cycle close or equal to 0.5 (depending on whether the divider is
>    odd or even) is produced, so no short pulses either.  I don't remember
>    the other four modes -- have a look at the datasheet if interested, but
>    I reckon they're not really compatible with the wiring anyway, e.g. the
>    gate is hardwired enabled.

I've also just skimmed parts of the 8254 section of "The Indispensable PC
Hardware Book", by Hans-Peter Messmer, Copyright 1994 Addison-Wesley,
although I probably ought to read it more carefully.

Under "normal" conditions, the 8254 part of the patch above should be
indistinguishable from previous behavior.  The 8259's IRR will
still show up as 1 until the interrupt is actually serviced,
and no new interrupt will be serviced after one is serviced until
another edge is injected via the high-low-high transition of the new
code.  (Unless the guest resets the 8259 or maybe messes with IMR,
but real hardware would generate extra interrupts in such cases as
well.)

The new code sounds much closer to mode 2 described by
Maciej, compared to the old code - except the duty cycle is
effectively 100 percent instead of 99.[some number of 9's] percent.

-----------------
But there might be some concerns in abnormal conditions:

   * If some guest is actually depending on a 50 percent duty cycle
     (maybe some kind of polling rather than interrupts), I would
     expect it to be just as broken before this patch as after,
     unless it is really weird (handles continuous 1's more
     gracefully than continuous 0's).

        According to the my book, mode 3 isn't normally
     used to create interrupts: "The generated square-wave signal
     can be used, for example, to trasmit data via serial interfaces.
     The PIT then operates as a baud rate generator.  In the PC,
     counters 1 and 2 are operated in mode 3 to drive memory refresh
     and the speaker, respectively." (page 369)

        I wouldn't be inclined to worry about the 50 percent duty
     cycle too much unless someone comes up with a real guest OS
     that depends on it.

   * To be correct there are probably some cases when the 8254 should
     force the IRQ0 line when the guest is setting up the 8254.  Based on
     a very quick reading of some of the 8254 section of my book:

      * Note that modes 0, 2, 3 and 4 look usable with a hard-wired GATE=1,
        but not modes 1 or 5.
      * Maybe force IRQ0=0 when the 8254 is disabled (however that is done;
        I haven't found it yet)?
      * Force IRQ0=0 when starting the timer in "mode 0" (one-shot).
      * Modes 2, 3, and 4 all apparently start with IRQ0=1.  I guess
        they generate an interrupt when first enabled?
         * Mode 2 (periodic) has a 99 percent duty cycle (high).
         * Mode 3 (periodic) a 50 percent duty cycle (see above).
         * Mode 4 (one-shot) is distinguished from mode 0 in that
           it generates both a high-low and low-high transition when
           it expires, instead of just a low-high.

   * I don't know anything about the HPET.  I didn't even realize
     it re-uses IRQ0.

   * If you back-migrate from after this change to before this change,
     maybe there is a risk that it will lose one timer interrupt?
     Forward migration shouldn't have an issue (the first trailing
     edge in the new code becomes a nop).  Perhaps hack it to encourage
     extra interrupts in some migration cases rather than lost interrupts
     in others?  Some kind of subsection thing like was being
     discussed for the 8259 earlier?  Or:

Also, how big of a concern is a very rare gained or lost IRQ0
actually?  Under normal conditions, I would expect this to at most
cause a one time clock drift in the guest OS of a fraction of
a second.  If that only happens when rebooting or migrating the
guest...

On the other hand, lost or gained interrupts might be a more
serious concern (especially if lost) if the 8254 is operating
in a one-shot mode (0 or 4): Something in the guest doesn't
stop (hangs) if not canceled by the interrupt.

Can anyone confirm or contradict any of this?  Other thoughts?

                - Matthew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html