Re: [Qemu-devel] [PATCH 1/2] KVM: fix i8259 interrupt high to low transition logic

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Thu, 13 Sep 2012 15:55:51 +0200

On 2012-09-13 07:49, Matthew Ogilvie wrote:
> On Wed, Sep 12, 2012 at 10:57:57AM +0200, Jan Kiszka wrote:
>> On 2012-09-12 10:51, Avi Kivity wrote:
>>> On 09/12/2012 11:48 AM, Jan Kiszka wrote:
>>>> On 2012-09-12 10:01, Avi Kivity wrote:
>>>>> On 09/10/2012 04:29 AM, Matthew Ogilvie wrote:
>>>>>> Intel's definition of "edge triggered" means: "asserted with a
>>>>>> low-to-high transition at the time an interrupt is registered
>>>>>> and then kept high until the interrupt is served via one of the
>>>>>> EOI mechanisms or goes away unhandled."
>>>>>>
>>>>>> So the only difference between edge triggered and level triggered
>>>>>> is in the leading edge, with no difference in the trailing edge.
>>>>>>
>>>>>> This bug manifested itself when the guest was Microport UNIX
>>>>>> System V/386 v2.1 (ca. 1987), because it would sometimes mask
>>>>>> off IRQ14 in the slave IMR after it had already been asserted.
>>>>>> The master would still try to deliver an interrupt even though
>>>>>> IRQ2 had dropped again, resulting in a spurious interupt
>>>>>> (IRQ15) and a panicked UNIX kernel.
>>>>>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
>>>>>> index adba28f..5cbba99 100644
>>>>>> --- a/arch/x86/kvm/i8254.c
>>>>>> +++ b/arch/x86/kvm/i8254.c
>>>>>> @@ -302,8 +302,12 @@ static void pit_do_work(struct kthread_work *work)
>>>>>>  	}
>>>>>>  	spin_unlock(&ps->inject_lock);
>>>>>>  	if (inject) {
>>>>>> -		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>>>>>> +		/* Clear previous interrupt, then create a rising
>>>>>> +		 * edge to request another interupt, and leave it at
>>>>>> +		 * level=1 until time to inject another one.
>>>>>> +		 */
>>>>>>  		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
>>>>>> +		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
>>>>>>  
>>>>>>  		/*
>>>>>
>>>>> I thought I understood this, now I'm not sure.  How can this be correct?
>>>>>  Real hardware doesn't act like this.
>>>>>
>>>>> What if the PIT is disabled after this?  You're injecting a spurious
>>>>> interrupt then.
>>>>
>>>> Yes, the PIT has to raise the output as long as specified, i.e.
>>>> according to the datasheet. That's important now due to the corrections
>>>> to the PIC. We can then carefully check if there is room for
>>>> simplifications / optimizations. I also cannot imagine that the above
>>>> already fulfills these requirements.
>>>>
>>>> And if the PIT is disabled by the HPET, we need to clear the output
>>>> explicitly as we inject the IRQ#0 under a different source ID than
>>>> userspace HPET does (which will logically take over IRQ#0 control). The
>>>> kernel would otherwise OR both sources to an incorrect result.
>>>>
>>>
>>> I guess we need to double the hrtimer rate then in order to generate a
>>> square wave.  It's getting ridiculous how accurate our model needs to be.
>>
>> I would suggest to solve this for the userspace model first, ensure that
>> it works properly in all modes, maybe optimize it, and then decide how
>> to map all this on kernel space. As long as we have two models, we can
>> also make use of them.
> 
> Thoughts about the 8254 PIT:
> 
> First, this summary of (real) 8254 PIT behavior seems fairly
> good, as far it goes:
> 
> On Tue, Sep 04, 2012 at 07:27:38PM +0100, Maciej W. Rozycki wrote:
>>  * The 8254 PIT is normally configured in mode 2 or 3 in the PC/AT
>>    architecture.  In the former its output is high (active) all the time
>>    except from one (last) clock cycle.  In the latter a wave that has a
>>    duty cycle close or equal to 0.5 (depending on whether the divider is
>>    odd or even) is produced, so no short pulses either.  I don't remember
>>    the other four modes -- have a look at the datasheet if interested, but
>>    I reckon they're not really compatible with the wiring anyway, e.g. the
>>    gate is hardwired enabled.
> 
> I've also just skimmed parts of the 8254 section of "The Indispensable PC
> Hardware Book", by Hans-Peter Messmer, Copyright 1994 Addison-Wesley,
> although I probably ought to read it more carefully.

http://download.intel.com/design/archives/periphrl/docs/23124406.pdf
should be the primary reference - as long as it leaves no open questions.

> 
> Under "normal" conditions, the 8254 part of the patch above should be
> indistinguishable from previous behavior.  The 8259's IRR will
> still show up as 1 until the interrupt is actually serviced,
> and no new interrupt will be serviced after one is serviced until
> another edge is injected via the high-low-high transition of the new
> code.  (Unless the guest resets the 8259 or maybe messes with IMR,
> but real hardware would generate extra interrupts in such cases as
> well.)
> 
> The new code sounds much closer to mode 2 described by
> Maciej, compared to the old code - except the duty cycle is
> effectively 100 percent instead of 99.[some number of 9's] percent.
> 
> -----------------
> But there might be some concerns in abnormal conditions:
> 
>    * If some guest is actually depending on a 50 percent duty cycle
>      (maybe some kind of polling rather than interrupts), I would
>      expect it to be just as broken before this patch as after,
>      unless it is really weird (handles continuous 1's more
>      gracefully than continuous 0's).
> 
>         According to the my book, mode 3 isn't normally
>      used to create interrupts: "The generated square-wave signal
>      can be used, for example, to trasmit data via serial interfaces.
>      The PIT then operates as a baud rate generator.  In the PC,
>      counters 1 and 2 are operated in mode 3 to drive memory refresh
>      and the speaker, respectively." (page 369)
> 
>         I wouldn't be inclined to worry about the 50 percent duty
>      cycle too much unless someone comes up with a real guest OS
>      that depends on it.
> 
>    * To be correct there are probably some cases when the 8254 should
>      force the IRQ0 line when the guest is setting up the 8254.  Based on
>      a very quick reading of some of the 8254 section of my book:
> 
>       * Note that modes 0, 2, 3 and 4 look usable with a hard-wired GATE=1,
>         but not modes 1 or 5.
>       * Maybe force IRQ0=0 when the 8254 is disabled (however that is done;
>         I haven't found it yet)?
>       * Force IRQ0=0 when starting the timer in "mode 0" (one-shot).
>       * Modes 2, 3, and 4 all apparently start with IRQ0=1.  I guess
>         they generate an interrupt when first enabled?
>          * Mode 2 (periodic) has a 99 percent duty cycle (high).
>          * Mode 3 (periodic) a 50 percent duty cycle (see above).
>          * Mode 4 (one-shot) is distinguished from mode 0 in that
>            it generates both a high-low and low-high transition when
>            it expires, instead of just a low-high.
> 
>    * I don't know anything about the HPET.  I didn't even realize
>      it re-uses IRQ0.

In modern chipsets, the HPET can drive IRQ0 instead of the PIT. A muxer
selects the input for that purpose. And we disable the timer of the PIT
in that case to avoid needless background activity. See e.g.
pit_irq_control().

> 
>    * If you back-migrate from after this change to before this change,
>      maybe there is a risk that it will lose one timer interrupt?
>      Forward migration shouldn't have an issue (the first trailing
>      edge in the new code becomes a nop).  Perhaps hack it to encourage
>      extra interrupts in some migration cases rather than lost interrupts
>      in others?  Some kind of subsection thing like was being
>      discussed for the 8259 earlier?  Or:
> 
> Also, how big of a concern is a very rare gained or lost IRQ0
> actually?  Under normal conditions, I would expect this to at most
> cause a one time clock drift in the guest OS of a fraction of
> a second.  If that only happens when rebooting or migrating the
> guest...
> 
> On the other hand, lost or gained interrupts might be a more
> serious concern (especially if lost) if the 8254 is operating
> in a one-shot mode (0 or 4): Something in the guest doesn't
> stop (hangs) if not canceled by the interrupt.
> 
> 
> Can anyone confirm or contradict any of this?  Other thoughts?

See my other reply: Until we really understand the impact of some
optimization / simplification, we are forced to do it accurately.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html