Re: [PATCH 2/2] x86, apicv: Add Posted Interrupt supporting

Gleb Natapov <gleb@xxxxxxxxxx> · Thu, 7 Feb 2013 15:52:24 +0200



On Wed, Feb 06, 2013 at 10:24:06PM -0200, Marcelo Tosatti wrote:
> On Wed, Feb 06, 2013 at 08:49:23PM -0200, Marcelo Tosatti wrote:
> > On Tue, Feb 05, 2013 at 09:32:50AM +0200, Gleb Natapov wrote:
> > > On Mon, Feb 04, 2013 at 06:47:30PM -0200, Marcelo Tosatti wrote:
> > > > On Mon, Feb 04, 2013 at 05:59:52PM -0200, Marcelo Tosatti wrote:
> > > > > On Mon, Feb 04, 2013 at 07:13:01PM +0200, Gleb Natapov wrote:
> > > > > > On Mon, Feb 04, 2013 at 12:43:45PM -0200, Marcelo Tosatti wrote:
> > > > > > > > > Any example how software relies on such two-interrupts-queued-in-IRR/ISR behaviour?
> > > > > > > > Don't know about guests, but KVM relies on it to detect interrupt
> > > > > > > > coalescing. So if interrupt is set in IRR but not in PIR interrupt will
> > > > > > > > not be reported as coalesced, but it will be coalesced during PIR->IRR
> > > > > > > > merge.
> > > > > > > 
> > > > > > > Yes, so:
> > > > > > > 
> > > > > > > 1. IRR=1, ISR=0, PIR=0. Event: set_irq, coalesced=no.
> > > > > > > 2. IRR=0, ISR=1, PIR=0. Event: IRR->ISR transfer.
> > > > > > > 3. vcpu outside of guest mode.
> > > > > > > 4. IRR=1, ISR=1, PIR=0. Event: set_irq, coalesced=no.
> > > > > > > 5. vcpu enters guest mode.
> > > > > > > 6. IRR=1, ISR=1, PIR=1. Event: set_irq, coalesced=no.
> > > > > > > 7. HW transfers PIR into IRR.
> > > > > > > 
> > > > > > > set_irq return value at 7 is incorrect, interrupt event was _not_
> > > > > > > queued.
> > > > > > Not sure I understand the flow of events in your description correctly. As I
> > > > > > understand it at 4 set_irq() will return incorrect result. Basically
> > > > > > when PIR is set to 1 while IRR has 1 for the vector the value of
> > > > > > set_irq() will be incorrect.
> > > > > 
> > > > > At 4 it has not been coalesced: it has been queued to IRR.
> > > > > At 6 it has been coalesced: PIR bit merged into IRR bit.
> > > > > 
> > > Yes, that's the case.
> > > 
> > > > > > Frankly I do not see how it can be fixed
> > > > > > without any race with present HW PIR design.
> > > > > 
> > > > > At kvm_accept_apic_interrupt, check IRR before setting PIR bit, if IRR
> > > > > already set, don't set PIR.
> > > Need to check both IRR and PIR. Something like that:
> > > 
> > > apic_accept_interrupt() {
> > >  if (PIR || IRR)
> > >    return coalesced;
> > >  else
> > >    set PIR;
> > > }
> > > 
> > > This has two problems. Firs is that interrupt that can be delivered will
> > > be not (IRR is cleared just after it was tested), but it will be reported
> > > as coalesced, so this is benign race. 
> > 
> > Yes, and the same condition exists today with IRR, its fine.
> > 
> > > Second is that interrupt may be
> > > reported as delivered, but it will be coalesced (possible only with the self
> > > IPI with the same vector):
> > > 
> > > Starting condition: PIR=0, IRR=0 vcpu is in a guest mode
> > > 
> > > io thread                 |           vcpu
> > > accept_apic_interrupt()   |
> > >  PIR and IRR is zero      |
> > >  set PIR                  |
> > >  return delivered         |
> > >                           |      self IPI
> > >                           |      set IRR
> > >                           |     merge PIR to IRR (*)
> > > 
> > > At (*) interrupt that was reported as delivered is coalesced.
> > 
> > Only vcpu itself should send self-IPI, so its fine.
> > 
> > > > Or:
> > > > 
> > > > apic_accept_interrupt() {
> > > > 
> > > > 1. Read ORIG_PIR=PIR, ORIG_IRR=IRR.
> > > > Never set IRR when HWAPIC enabled, even if outside of guest mode.
> > > > 2. Set PIR and let HW or SW VM-entry transfer it to IRR.
> > > > 3. set_irq return value: (ORIG_PIR or ORIG_IRR set).
> > > > }
> > > > 
> > > This can report interrupt as coalesced, but it will be eventually delivered
> > > as separate interrupt:
> > > 
> > > Starting condition: PIR=0, IRR=1 vcpu is in a guest mode
> > > 
> > >  io thread              |         vcpu
> > >                         |         
> > > accept_apic_interrupt() |
> > > ORIG_PIR=0, ORIG_IRR=1  |
> > >                         |    EOI
> > >                         |    clear IRR, set ISR
> > > set PIR                 |
> > > return coalesced        |
> > >                         |    clear PIR, set IRR
> > >                         |    EOI
> > >                         |    clear IRR, set ISR (*)
> > > 
> > > At (*) interrupt that was reported as coalesced is delivered.
> > > 
> > > 
> > > So still no perfect solution. But first one has much less serious
> > > problems for our practical needs.
> > > 
> > > > Two or more concurrent set_irq can race with each other, though. Can
> > > > either document the race or add a lock.
> > > > 
> > > 
> > > --
> > > 			Gleb.
> > 
> > Ok, then:
> > 
> > accept_apic_irq:
> > 1. coalesced = test_and_set_bit(PIR)
> > 2. set KVM_REQ_EVENT bit 	(*)
> > 3. if (vcpu->in_guest_mode)
> > 4.	if (test_and_set_bit(pir notification bit))
> > 5.		send PIR IPI
> > 6. return coalesced
> > 
> > Other sites:
> > A: On VM-entry, after disabling interrupts, but before
> > the last check for ->requests, clear pir notification bit 
> > (unconditionally).
> > 
> > (*) This is _necessary_ also because during VM-exit a PIR IPI interrupt can 
> > be missed, so the KVM_REQ_EVENT indicates that SW is responsible for
> > PIR->IRR transfer.
> 
> Its not a bad idea to have a new KVM_REQ_ bit for PIR processing (just
> as the current patches do).
Without the numbers I do not see why.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html