On Thu, Jan 10, 2013 at 11:40:07PM -0700, Matthew Ogilvie wrote: > On Tue, Jan 08, 2013 at 09:31:19AM +0200, Gleb Natapov wrote: > > On Mon, Jan 07, 2013 at 06:17:22PM -0600, mmogilvi@xxxxxxxxxxxx wrote: > > > On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov <gleb@xxxxxxxxxx> wrote: > > > > On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote: > > > >> Reading the spec, it is clear that most modes normally leave the IRQ > > > >> output line high, and only pulse it low to generate a leading edge. > > > >> Especially the most commonly used mode 2. > > > >> > > > >> The KVM i8254 model does not try to emulate the duration of the pulse at > > > >> all, so just swap the high/low settings it to leave it high most of > > > >> the time. > > > >> > > > >> This fix is a prerequisite to improving the i8259 model to handle > > > >> the trailing edge of an interupt request as indicated in its spec: > > > >> If it gets a trailing edge of an IRQ line before it starts to service > > > >> the interrupt, the request should be canceled. > > > >> > > > >> See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz > > > >> or search the net for 23124406.pdf. > > > >> > > > >> Risks: > > > >> > > > >> There is a risk that migrating a running guest between versions > > > >> with and without this patch will lose or gain a single timer > > > >> interrupt during the migration process. The only case where > > > > Can you elaborate on how exactly this can happen? Do not see it. > > > > > > > > > > KVM 8254: In the corrected model, when the count expires, the model > > > briefly pulses output low and then high again, with the low to high > > > transition being what triggers the interrupt. In the old model, > > > when the count expires, the model expects the output line > > > to already be low, and briefly pulses it high (triggering the > > > interrupt) and then low again. But if the line was already > > > high (because it migrated from the corrected model), > > > this won't generate a new leading edge (low to high) and won't > > > trigger a new interrupt (the first post-back-migration pulse turns > > > into a simple trailing edge instead of a pulse). > > > > > > Unless there is something I'm missing? > > > > > No, I missed that pic->last_irr/ioapic->irr will be migrated as 1. But > > this means that next interrupt after migration from new to old will > > always be lost. What about clearing pit bit from last_irr/irr before > > migration? Should not affect new->new migration and should fix new->old > > one. The only problem is that we may need to consult irq routing table > > to know how pit is connected to ioapic. > > We should not clear both last_irr and irr. That > cancels the interrupt early if the CPU hasn't started servicing > it yet: If the guest CPU has interrupts disabled > and hasn't begun to service the interrupt, a new->new migration could > lose the interrupt much earlier in the countdown cycle than it otherwise > might be lost. > I am talking about last_irr in pic and irr in ioapic. Of course we shouldn't clear pic->irr on migration. ioapic uses irr to detect edge. > Potentially we could clear the last_irr bit only. This effectively > makes migration behave like the old unfixed code. But if this is > considered acceptable, I would be more inclined to not fix IRQ0 at all, If we do not fix IRQ0 next timer interrupt after migration will always be lost. > rather than make IRQ0 work subtly differently normally vs during > migration. One of my earlier patch series (qemu v7 dated Nov 25) > attempts to use basically this strategy for the qemu model, at > least in the short term (and then potentially fix it properly > in the longer term), although the details of series might > be tweaked. > > Or the minimal-risk strategy is to ONLY fix the cascade IRQ2's > trailing edge [the original i825* problem that spawned this whole > thing], leaving all other IRQs as-is. > > > > > Still do not see how can we gain one interrupt. > > In most cases I don't think we get an extra interrupt from > a direct fix. But some kinds of attempts to work around missing > interrupts during migration can cause cause extra interrupts in other > cases. In the old qemu model (but perhaps not kvm), the mode 2 leading > edge occurs one clock tick earlier than it ought to. So if you > try to be tricky with adjusting levels during migration, you > may introduce possible cases where it gets an interrupt in > the old model, then migrates and gets another interrupt one tick > later in the new model... > > Also, it occurs to me that maybe there might be an off-by-one issue in > the kvm model of mode 2 that ought to be fixed as well? Although the > zero length pulse in kvm suggests that one-off issues in counters do > not matter. In the qemu model, the leading edge of OUT in mode 2 > moves by one tick... > > > > > > The qemu 8254 model actually models each edge at independent > > > clock ticks instead of combining both into a very brief pulse at one time. > > > I've found it handy to draw out old and new timing diagrams on paper > > > (for each mode), and then carefully think about what happens with respect > > > to levels and edges when you transition back and forth between old and > > > new models at various points in the timing cycle. (Note I've spent more > > > time examining the qemu models rather than the kvm models.) > > > > > Yes, drawing it definitely helps :) > > > > > >> this is likely to be serious is probably losing a single-shot (mode 4) > > > >> interrupt, but if my understanding of how things work is good, then > > > >> that should only be possible if a whole slew of conditions are > > > >> all met: > > > >> > > > >> 1. The guest is configured to run in a "tickless" mode (like > > > >> modern Linux). > > > >> 2. The guest is for some reason still using the i8254 rather > > > >> than something more modern like an HPET. (The combination > > > >> of 1 and 2 should be rare.) > > > > This is not so rare. For performance reason it is better to not have > > > > HPET at all. In fact -no-hpet is how I would advice anyone to run qemu. > > > > > > In a later email you mention that Linux prefers a timer in the APIC. > > > I don't know much about the APIC (advanced interrupt controller), and > > > wasn't > > > even aware had it's own timer. > > > > > > The big question is if we can safely just fix the i825* models, or if > > > we need something more subtle to avoid breaking commonly used guests > > > like modern Linux (support both corrected and old models, > > > or only fix IRQ2 instead of all IRQs, or similar subtlety). > > Migration may happen while guest is running firmaware. Who knows what > > those are doing. If the fix is as easy as I described above we should go > > for it. > > > > > > > > > > > > >> 3. The migration is going from a fixed version back to the > > > >> old version. (Not sure how common this is, but it should > > > >> be rarer than migrating from old to new.) > > > >> 4. There are not going to be any "timely" events/interrupts > > > >> (keyboard, network, process sleeps, etc) that cause the guest > > > >> to reset the PIT mode 4 one-shot counter "soon enough". > > > >> > > > >> This combination should be rare enough that more complicated > > > >> solutions are not worth the effort. > > > >> > > > >> Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@xxxxxxxxxxxx> > > > >> --- > > > >> arch/x86/kvm/i8254.c | 6 +++++- > > > >> 1 file changed, 5 insertions(+), 1 deletion(-) > > > >> > > > >> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c > > > >> index c1d30b2..cd4ec60 100644 > > > >> --- a/arch/x86/kvm/i8254.c > > > >> +++ b/arch/x86/kvm/i8254.c > > > >> @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work) > > > >> } > > > >> spin_unlock(&ps->inject_lock); > > > >> if (inject) { > > > >> - kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1); > > > >> + /* Clear previous interrupt, then create a rising > > > >> + * edge to request another interupt, and leave it at > > > >> + * level=1 until time to inject another one. > > > >> + */ > > > >> kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0); > > > >> + kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1); > > > >> > > > >> /* > > > >> * Provides NMI watchdog support via Virtual Wire mode. > > > >> -- > > > >> 1.7.10.2.484.gcd07cc5 > > > > > > > > -- > > > > Gleb. > > > > -- > > Gleb. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html