On 2012-09-13 07:49, Matthew Ogilvie wrote: > On Wed, Sep 12, 2012 at 10:57:57AM +0200, Jan Kiszka wrote: >> On 2012-09-12 10:51, Avi Kivity wrote: >>> On 09/12/2012 11:48 AM, Jan Kiszka wrote: >>>> On 2012-09-12 10:01, Avi Kivity wrote: >>>>> On 09/10/2012 04:29 AM, Matthew Ogilvie wrote: >>>>>> Intel's definition of "edge triggered" means: "asserted with a >>>>>> low-to-high transition at the time an interrupt is registered >>>>>> and then kept high until the interrupt is served via one of the >>>>>> EOI mechanisms or goes away unhandled." >>>>>> >>>>>> So the only difference between edge triggered and level triggered >>>>>> is in the leading edge, with no difference in the trailing edge. >>>>>> >>>>>> This bug manifested itself when the guest was Microport UNIX >>>>>> System V/386 v2.1 (ca. 1987), because it would sometimes mask >>>>>> off IRQ14 in the slave IMR after it had already been asserted. >>>>>> The master would still try to deliver an interrupt even though >>>>>> IRQ2 had dropped again, resulting in a spurious interupt >>>>>> (IRQ15) and a panicked UNIX kernel. >>>>>> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c >>>>>> index adba28f..5cbba99 100644 >>>>>> --- a/arch/x86/kvm/i8254.c >>>>>> +++ b/arch/x86/kvm/i8254.c >>>>>> @@ -302,8 +302,12 @@ static void pit_do_work(struct kthread_work *work) >>>>>> } >>>>>> spin_unlock(&ps->inject_lock); >>>>>> if (inject) { >>>>>> - kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1); >>>>>> + /* Clear previous interrupt, then create a rising >>>>>> + * edge to request another interupt, and leave it at >>>>>> + * level=1 until time to inject another one. >>>>>> + */ >>>>>> kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0); >>>>>> + kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1); >>>>>> >>>>>> /* >>>>> >>>>> I thought I understood this, now I'm not sure. How can this be correct? >>>>> Real hardware doesn't act like this. >>>>> >>>>> What if the PIT is disabled after this? You're injecting a spurious >>>>> interrupt then. >>>> >>>> Yes, the PIT has to raise the output as long as specified, i.e. >>>> according to the datasheet. That's important now due to the corrections >>>> to the PIC. We can then carefully check if there is room for >>>> simplifications / optimizations. I also cannot imagine that the above >>>> already fulfills these requirements. >>>> >>>> And if the PIT is disabled by the HPET, we need to clear the output >>>> explicitly as we inject the IRQ#0 under a different source ID than >>>> userspace HPET does (which will logically take over IRQ#0 control). The >>>> kernel would otherwise OR both sources to an incorrect result. >>>> >>> >>> I guess we need to double the hrtimer rate then in order to generate a >>> square wave. It's getting ridiculous how accurate our model needs to be. >> >> I would suggest to solve this for the userspace model first, ensure that >> it works properly in all modes, maybe optimize it, and then decide how >> to map all this on kernel space. As long as we have two models, we can >> also make use of them. > > Thoughts about the 8254 PIT: > > First, this summary of (real) 8254 PIT behavior seems fairly > good, as far it goes: > > On Tue, Sep 04, 2012 at 07:27:38PM +0100, Maciej W. Rozycki wrote: >> * The 8254 PIT is normally configured in mode 2 or 3 in the PC/AT >> architecture. In the former its output is high (active) all the time >> except from one (last) clock cycle. In the latter a wave that has a >> duty cycle close or equal to 0.5 (depending on whether the divider is >> odd or even) is produced, so no short pulses either. I don't remember >> the other four modes -- have a look at the datasheet if interested, but >> I reckon they're not really compatible with the wiring anyway, e.g. the >> gate is hardwired enabled. > > I've also just skimmed parts of the 8254 section of "The Indispensable PC > Hardware Book", by Hans-Peter Messmer, Copyright 1994 Addison-Wesley, > although I probably ought to read it more carefully. http://download.intel.com/design/archives/periphrl/docs/23124406.pdf should be the primary reference - as long as it leaves no open questions. > > Under "normal" conditions, the 8254 part of the patch above should be > indistinguishable from previous behavior. The 8259's IRR will > still show up as 1 until the interrupt is actually serviced, > and no new interrupt will be serviced after one is serviced until > another edge is injected via the high-low-high transition of the new > code. (Unless the guest resets the 8259 or maybe messes with IMR, > but real hardware would generate extra interrupts in such cases as > well.) > > The new code sounds much closer to mode 2 described by > Maciej, compared to the old code - except the duty cycle is > effectively 100 percent instead of 99.[some number of 9's] percent. > > ----------------- > But there might be some concerns in abnormal conditions: > > * If some guest is actually depending on a 50 percent duty cycle > (maybe some kind of polling rather than interrupts), I would > expect it to be just as broken before this patch as after, > unless it is really weird (handles continuous 1's more > gracefully than continuous 0's). > > According to the my book, mode 3 isn't normally > used to create interrupts: "The generated square-wave signal > can be used, for example, to trasmit data via serial interfaces. > The PIT then operates as a baud rate generator. In the PC, > counters 1 and 2 are operated in mode 3 to drive memory refresh > and the speaker, respectively." (page 369) > > I wouldn't be inclined to worry about the 50 percent duty > cycle too much unless someone comes up with a real guest OS > that depends on it. > > * To be correct there are probably some cases when the 8254 should > force the IRQ0 line when the guest is setting up the 8254. Based on > a very quick reading of some of the 8254 section of my book: > > * Note that modes 0, 2, 3 and 4 look usable with a hard-wired GATE=1, > but not modes 1 or 5. > * Maybe force IRQ0=0 when the 8254 is disabled (however that is done; > I haven't found it yet)? > * Force IRQ0=0 when starting the timer in "mode 0" (one-shot). > * Modes 2, 3, and 4 all apparently start with IRQ0=1. I guess > they generate an interrupt when first enabled? > * Mode 2 (periodic) has a 99 percent duty cycle (high). > * Mode 3 (periodic) a 50 percent duty cycle (see above). > * Mode 4 (one-shot) is distinguished from mode 0 in that > it generates both a high-low and low-high transition when > it expires, instead of just a low-high. > > * I don't know anything about the HPET. I didn't even realize > it re-uses IRQ0. In modern chipsets, the HPET can drive IRQ0 instead of the PIT. A muxer selects the input for that purpose. And we disable the timer of the PIT in that case to avoid needless background activity. See e.g. pit_irq_control(). > > * If you back-migrate from after this change to before this change, > maybe there is a risk that it will lose one timer interrupt? > Forward migration shouldn't have an issue (the first trailing > edge in the new code becomes a nop). Perhaps hack it to encourage > extra interrupts in some migration cases rather than lost interrupts > in others? Some kind of subsection thing like was being > discussed for the 8259 earlier? Or: > > Also, how big of a concern is a very rare gained or lost IRQ0 > actually? Under normal conditions, I would expect this to at most > cause a one time clock drift in the guest OS of a fraction of > a second. If that only happens when rebooting or migrating the > guest... > > On the other hand, lost or gained interrupts might be a more > serious concern (especially if lost) if the 8254 is operating > in a one-shot mode (0 or 4): Something in the guest doesn't > stop (hangs) if not canceled by the interrupt. > > > Can anyone confirm or contradict any of this? Other thoughts? See my other reply: Until we really understand the impact of some optimization / simplification, we are forced to do it accurately. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html