Re: [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally high

Gleb Natapov <gleb@xxxxxxxxxx> · Fri, 11 Jan 2013 17:45:28 +0200



On Thu, Jan 10, 2013 at 11:40:07PM -0700, Matthew Ogilvie wrote:
> On Tue, Jan 08, 2013 at 09:31:19AM +0200, Gleb Natapov wrote:
> > On Mon, Jan 07, 2013 at 06:17:22PM -0600, mmogilvi@xxxxxxxxxxxx wrote:
> > > On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov <gleb@xxxxxxxxxx> wrote:
> > > > On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
> > > >> Reading the spec, it is clear that most modes normally leave the IRQ
> > > >> output line high, and only pulse it low to generate a leading edge.
> > > >> Especially the most commonly used mode 2.
> > > >> 
> > > >> The KVM i8254 model does not try to emulate the duration of the pulse at
> > > >> all, so just swap the high/low settings it to leave it high most of
> > > >> the time.
> > > >> 
> > > >> This fix is a prerequisite to improving the i8259 model to handle
> > > >> the trailing edge of an interupt request as indicated in its spec:
> > > >> If it gets a trailing edge of an IRQ line before it starts to service
> > > >> the interrupt, the request should be canceled.
> > > >> 
> > > >> See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
> > > >> or search the net for 23124406.pdf.
> > > >> 
> > > >> Risks:
> > > >> 
> > > >> There is a risk that migrating a running guest between versions
> > > >> with and without this patch will lose or gain a single timer
> > > >> interrupt during the migration process.  The only case where
> > > > Can you elaborate on how exactly this can happen? Do not see it.
> > > > 
> > > 
> > > KVM 8254: In the corrected model, when the count expires, the model
> > > briefly pulses output low and then high again, with the low to high
> > > transition being what triggers the interrupt.  In the old model,
> > > when the count expires, the model expects the output line
> > > to already be low, and briefly pulses it high (triggering the
> > > interrupt) and then low again.  But if the line was already
> > > high (because it migrated from the corrected model),
> > > this won't generate a new leading edge (low to high) and won't
> > > trigger a new interrupt (the first post-back-migration pulse turns
> > > into a simple trailing edge instead of a pulse).
> > > 
> > > Unless there is something I'm missing?
> > > 
> > No, I missed that pic->last_irr/ioapic->irr  will be migrated as 1. But
> > this means that next interrupt after migration from new to old will
> > always be lost.  What about clearing pit bit from last_irr/irr before
> > migration? Should not affect new->new migration and should fix new->old
> > one. The only problem is that we may need to consult irq routing table
> > to know how pit is connected to ioapic.
> 
> We should not clear both last_irr and irr.  That
> cancels the interrupt early if the CPU hasn't started servicing
> it yet:  If the guest CPU has interrupts disabled
> and hasn't begun to service the interrupt, a new->new migration could
> lose the interrupt much earlier in the countdown cycle than it otherwise
> might be lost.
> 
I am talking about last_irr in pic and irr in ioapic. Of course we
shouldn't clear pic->irr on migration. ioapic uses irr to detect edge.

> Potentially we could clear the last_irr bit only.  This effectively
> makes migration behave like the old unfixed code.  But if this is
> considered acceptable, I would be more inclined to not fix IRQ0 at all,
If we do not fix IRQ0 next timer interrupt after migration will always be
lost.

> rather than make IRQ0 work subtly differently normally vs during
> migration.  One of my earlier patch series (qemu v7 dated Nov 25)
> attempts to use basically this strategy for the qemu model, at
> least in the short term (and then potentially fix it properly
> in the longer term), although the details of series might
> be tweaked.
> 
> Or the minimal-risk strategy is to ONLY fix the cascade IRQ2's
> trailing edge [the original i825* problem that spawned this whole
> thing], leaving all other IRQs as-is.
> 
> > 
> > Still do not see how can we gain one interrupt.
> 
> In most cases I don't think we get an extra interrupt from
> a direct fix.  But some kinds of attempts to work around missing
> interrupts during migration can cause cause extra interrupts in other
> cases.  In the old qemu model (but perhaps not kvm), the mode 2 leading
> edge occurs one clock tick earlier than it ought to.  So if you
> try to be tricky with adjusting levels during migration, you
> may introduce possible cases where it gets an interrupt in
> the old model, then migrates and gets another interrupt one tick
> later in the new model...
> 
> Also, it occurs to me that maybe there might be an off-by-one issue in
> the kvm model of mode 2 that ought to be fixed as well?  Although the  
> zero length pulse in kvm suggests that one-off issues in counters do  
> not matter.  In the qemu model, the leading edge of OUT in mode 2
> moves by one tick...
> 
> > 
> > > The qemu 8254 model actually models each edge at independent
> > > clock ticks instead of combining both into a very brief pulse at one time.
> > > I've found it handy to draw out old and new timing diagrams on paper
> > > (for each mode), and then carefully think about what happens with respect
> > > to levels and edges when you transition back and forth between old and
> > > new models at various points in the timing cycle.  (Note I've spent more
> > > time examining the qemu models rather than the kvm models.)
> > > 
> > Yes, drawing it definitely helps :)
> > 
> > > >> this is likely to be serious is probably losing a single-shot (mode 4)
> > > >> interrupt, but if my understanding of how things work is good, then
> > > >> that should only be possible if a whole slew of conditions are
> > > >> all met:
> > > >> 
> > > >>  1. The guest is configured to run in a "tickless" mode (like
> > > >>     modern Linux).
> > > >>  2. The guest is for some reason still using the i8254 rather
> > > >>     than something more modern like an HPET.  (The combination
> > > >>     of 1 and 2 should be rare.)
> > > > This is not so rare. For performance reason it is better to not have
> > > > HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.
> > > 
> > > In a later email you mention that Linux prefers a timer in the APIC.
> > > I don't know much about the APIC (advanced interrupt controller), and
> > > wasn't
> > > even aware had it's own timer.
> > > 
> > > The big question is if we can safely just fix the i825* models, or if
> > > we need something more subtle to avoid breaking commonly used guests
> > > like modern Linux (support both corrected and old models,
> > > or only fix IRQ2 instead of all IRQs, or similar subtlety).
> > Migration may happen while guest is running firmaware. Who knows what
> > those are doing. If the fix is as easy as I described above we should go
> > for it.
> > 
> > > 
> > > > 
> > > >>  3. The migration is going from a fixed version back to the
> > > >>     old version.  (Not sure how common this is, but it should
> > > >>     be rarer than migrating from old to new.)
> > > >>  4. There are not going to be any "timely" events/interrupts
> > > >>     (keyboard, network, process sleeps, etc) that cause the guest
> > > >>     to reset the PIT mode 4 one-shot counter "soon enough".
> > > >> 
> > > >> This combination should be rare enough that more complicated
> > > >> solutions are not worth the effort.
> > > >> 
> > > >> Signed-off-by: Matthew Ogilvie <mmogilvi_qemu@xxxxxxxxxxxx>
> > > >> ---
> > > >>  arch/x86/kvm/i8254.c | 6 +++++-
> > > >>  1 file changed, 5 insertions(+), 1 deletion(-)
> > > >> 
> > > >> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> > > >> index c1d30b2..cd4ec60 100644
> > > >> --- a/arch/x86/kvm/i8254.c
> > > >> +++ b/arch/x86/kvm/i8254.c
> > > >> @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
> > > >>  	}
> > > >>  	spin_unlock(&ps->inject_lock);
> > > >>  	if (inject) {
> > > >> -		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> > > >> +		/* Clear previous interrupt, then create a rising
> > > >> +		 * edge to request another interupt, and leave it at
> > > >> +		 * level=1 until time to inject another one.
> > > >> +		 */
> > > >>  		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
> > > >> +		kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> > > >>  
> > > >>  		/*
> > > >>  		 * Provides NMI watchdog support via Virtual Wire mode.
> > > >> -- 
> > > >> 1.7.10.2.484.gcd07cc5
> > > > 
> > > > --
> > > > 			Gleb.
> > 
> > --
> > 			Gleb.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html