On Wed, Sep 12, 2012 at 10:57:57AM +0200, Jan Kiszka wrote: > On 2012-09-12 10:51, Avi Kivity wrote: > > On 09/12/2012 11:48 AM, Jan Kiszka wrote: > >> On 2012-09-12 10:01, Avi Kivity wrote: > >>> On 09/10/2012 04:29 AM, Matthew Ogilvie wrote: > >>>> Intel's definition of "edge triggered" means: "asserted with a > >>>> low-to-high transition at the time an interrupt is registered > >>>> and then kept high until the interrupt is served via one of the > >>>> EOI mechanisms or goes away unhandled." > >>>> > >>>> So the only difference between edge triggered and level triggered > >>>> is in the leading edge, with no difference in the trailing edge. > >>>> > >>>> This bug manifested itself when the guest was Microport UNIX > >>>> System V/386 v2.1 (ca. 1987), because it would sometimes mask > >>>> off IRQ14 in the slave IMR after it had already been asserted. > >>>> The master would still try to deliver an interrupt even though > >>>> IRQ2 had dropped again, resulting in a spurious interupt > >>>> (IRQ15) and a panicked UNIX kernel. > >>>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c > >>>> index adba28f..5cbba99 100644 > >>>> --- a/arch/x86/kvm/i8254.c > >>>> +++ b/arch/x86/kvm/i8254.c > >>>> @@ -302,8 +302,12 @@ static void pit_do_work(struct kthread_work *work) > >>>> } > >>>> spin_unlock(&ps->inject_lock); > >>>> if (inject) { > >>>> - kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1); > >>>> + /* Clear previous interrupt, then create a rising > >>>> + * edge to request another interupt, and leave it at > >>>> + * level=1 until time to inject another one. > >>>> + */ > >>>> kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0); > >>>> + kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1); > >>>> > >>>> /* > >>> > >>> I thought I understood this, now I'm not sure. How can this be correct? > >>> Real hardware doesn't act like this. > >>> > >>> What if the PIT is disabled after this? You're injecting a spurious > >>> interrupt then. > >> > >> Yes, the PIT has to raise the output as long as specified, i.e. > >> according to the datasheet. That's important now due to the corrections > >> to the PIC. We can then carefully check if there is room for > >> simplifications / optimizations. I also cannot imagine that the above > >> already fulfills these requirements. > >> > >> And if the PIT is disabled by the HPET, we need to clear the output > >> explicitly as we inject the IRQ#0 under a different source ID than > >> userspace HPET does (which will logically take over IRQ#0 control). The > >> kernel would otherwise OR both sources to an incorrect result. > >> > > > > I guess we need to double the hrtimer rate then in order to generate a > > square wave. It's getting ridiculous how accurate our model needs to be. > > I would suggest to solve this for the userspace model first, ensure that > it works properly in all modes, maybe optimize it, and then decide how > to map all this on kernel space. As long as we have two models, we can > also make use of them. Thoughts about the 8254 PIT: First, this summary of (real) 8254 PIT behavior seems fairly good, as far it goes: On Tue, Sep 04, 2012 at 07:27:38PM +0100, Maciej W. Rozycki wrote: > * The 8254 PIT is normally configured in mode 2 or 3 in the PC/AT > architecture. In the former its output is high (active) all the time > except from one (last) clock cycle. In the latter a wave that has a > duty cycle close or equal to 0.5 (depending on whether the divider is > odd or even) is produced, so no short pulses either. I don't remember > the other four modes -- have a look at the datasheet if interested, but > I reckon they're not really compatible with the wiring anyway, e.g. the > gate is hardwired enabled. I've also just skimmed parts of the 8254 section of "The Indispensable PC Hardware Book", by Hans-Peter Messmer, Copyright 1994 Addison-Wesley, although I probably ought to read it more carefully. Under "normal" conditions, the 8254 part of the patch above should be indistinguishable from previous behavior. The 8259's IRR will still show up as 1 until the interrupt is actually serviced, and no new interrupt will be serviced after one is serviced until another edge is injected via the high-low-high transition of the new code. (Unless the guest resets the 8259 or maybe messes with IMR, but real hardware would generate extra interrupts in such cases as well.) The new code sounds much closer to mode 2 described by Maciej, compared to the old code - except the duty cycle is effectively 100 percent instead of 99.[some number of 9's] percent. ----------------- But there might be some concerns in abnormal conditions: * If some guest is actually depending on a 50 percent duty cycle (maybe some kind of polling rather than interrupts), I would expect it to be just as broken before this patch as after, unless it is really weird (handles continuous 1's more gracefully than continuous 0's). According to the my book, mode 3 isn't normally used to create interrupts: "The generated square-wave signal can be used, for example, to trasmit data via serial interfaces. The PIT then operates as a baud rate generator. In the PC, counters 1 and 2 are operated in mode 3 to drive memory refresh and the speaker, respectively." (page 369) I wouldn't be inclined to worry about the 50 percent duty cycle too much unless someone comes up with a real guest OS that depends on it. * To be correct there are probably some cases when the 8254 should force the IRQ0 line when the guest is setting up the 8254. Based on a very quick reading of some of the 8254 section of my book: * Note that modes 0, 2, 3 and 4 look usable with a hard-wired GATE=1, but not modes 1 or 5. * Maybe force IRQ0=0 when the 8254 is disabled (however that is done; I haven't found it yet)? * Force IRQ0=0 when starting the timer in "mode 0" (one-shot). * Modes 2, 3, and 4 all apparently start with IRQ0=1. I guess they generate an interrupt when first enabled? * Mode 2 (periodic) has a 99 percent duty cycle (high). * Mode 3 (periodic) a 50 percent duty cycle (see above). * Mode 4 (one-shot) is distinguished from mode 0 in that it generates both a high-low and low-high transition when it expires, instead of just a low-high. * I don't know anything about the HPET. I didn't even realize it re-uses IRQ0. * If you back-migrate from after this change to before this change, maybe there is a risk that it will lose one timer interrupt? Forward migration shouldn't have an issue (the first trailing edge in the new code becomes a nop). Perhaps hack it to encourage extra interrupts in some migration cases rather than lost interrupts in others? Some kind of subsection thing like was being discussed for the 8259 earlier? Or: Also, how big of a concern is a very rare gained or lost IRQ0 actually? Under normal conditions, I would expect this to at most cause a one time clock drift in the guest OS of a fraction of a second. If that only happens when rebooting or migrating the guest... On the other hand, lost or gained interrupts might be a more serious concern (especially if lost) if the 8254 is operating in a one-shot mode (0 or 4): Something in the guest doesn't stop (hangs) if not canceled by the interrupt. Can anyone confirm or contradict any of this? Other thoughts? - Matthew -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html