On Thu, Jul 16, 2009 at 03:09:30PM -0300, Marcelo Tosatti wrote: > On Thu, Jul 16, 2009 at 10:51:16AM +0300, Gleb Natapov wrote: > > On Wed, Jul 15, 2009 at 06:38:48PM -0300, Marcelo Tosatti wrote: > > > On Wed, Jul 15, 2009 at 11:48:17PM +0300, Gleb Natapov wrote: > > > > > > + spin_unlock(&ioapic->lock); > > > > > > + kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i); > > > > > > + spin_lock(&ioapic->lock); > > > > > > > > > > While traversing the IOAPIC pins you drop the lock and acquire it again, > > > > > which is error prone: what if the redirection table is changed in > > > > > between, for example? > > > > > > > > > Yes, the ack notifiers is a problematic part. The only thing that > > > > current ack notifiers do is set_irq_level(0) so this is not the problem > > > > in practice. I'll see if I can eliminate this dropping of the lock here. > > > > > > Currently, yes. But take into account that an ack notifier might do > > > other things than set_irq_level(0). > > > > > For instance? Ack notifier can do whatever it wants if it does not call > > back into ioapic. It may call to ioapic only by set_irq_level(0) (which > > it does now from device assignment code) or by set_irq_level(1) and this > > does not make sense for level triggered interrupts because not calling > > set_irq_level(1) will have the same effect. > > Checking whether a given gsi is masked on the > PIC/IOAPIC from the PIC/IOAPIC mask notifiers > (http://www.spinics.net/lists/kvm/msg19093.html), for example. > > Do you think that particular case should be handled in a different way? I dislike ack notifiers and think that the should not be used if possible. kvm_set_irq() return value provide you enough info to know it interrupt was delivered or masked so the pit case can (and should) be handled without using irq notifiers at all. See how RTC interrupt coalescing is handled in qemu (it does not handle masking correctly, but only because qemu community enjoys having their code broken). > (it could check whether a GSI is masked or not on both PIC/IOAPIC/LAPIC > from a different context other than mask notifier). > > > But I think I found the way to not drop the lock here. > > See, this is adding more complexity than simplifying things (the > recursive check). This is not more complex than current code. (IMHO) > > > > Say for example an ack notifier that takes the PIC or IOAPIC lock > > > (for whatever reason). > > What reason? No one should take PIC or IOAPIC lock outside of PIC or > > IOAPIC code. This is layering violation. IOAPIC should be accessed only > > through its public interface (set_irq/mmio_read|write/update_eoi) > > See link above. I can't see how the code from link above can work with current code since mask notifiers are not called from PIC. > > > > > > > > > Also, irq_lock was held during ack and mask notifier callbacks, so they > > > > > (the callbacks) did not have to worry about concurrency. > > > > > > > > > Is it possible to have more then one ack for the same interrupt line at > > > > the same time? > > > > > > Why not? But maybe this is a somewhat stupid point, because you can > > > require the notifiers to handle that. > > If this is possible (and it looks like it may happen if IOAPIC broadcasts > > an interrupt vector to more then one vcpu) then it may be better to push > > complexity into an ack notifier to not penalize a common scenario. > > But again, if we will not drop the lock around ack notifier call they > > will be serialized. > > > > > > > > > > You questioned the purpose of irq_lock earlier, and thats what it is: > > > > > synchronization between pic and ioapic blur at the irq notifiers. > > > > > > > > > Pic has its own locking and it fails miserably in regards ack notifiers. > > > > I bet nobody tried device assignment with pic. It will not work. > > > > > > It fails to take irq_lock on automatic EOI because on guest entry > > > interrupts are disabled (see d5ecfdd25c412df9c0ecf3ab8e066461fd4f69df). > > This explains why the code is buggy, but does not fix the bug. There are > > two bugs at least with the pic + ack notifiers. The first one is that > > irq notifiers are called without irq_lock held and that will trigger > > WARN_ON(!mutex_is_locked(&kvm->irq_lock)) in kvm_set_irq() in device > > assignment case (besides what is the point of a lock if it is not taken > > on every code path?). > > This could be fixed by switching irq_lock to a spinlock. Yes, but then you want to minimize the amount of code that runs under a spinlock. > > > Another one is that you can't postpone call to ack notifiers in > > kvm_pic_read_irq() till after pic_update_irq(s). The reason is that > > pic_update_irq() will trigger acked interrupt immediately since ack > > notifier is the one who lowers irq line in device assignment case > > (that is the reason I haven't done the same in ioapic case BTW). > > I'm not sure i understand this one, can you be more verbose please? > 1. device injects level irq by calling kvm_pic_set_irq(1) 2. level is recorded in irr (pic_set_irq1()) 3. pic_update_irq() called and interrupt is sent to vcpu 4. vcpu calls kvm_pic_read_irq() which calls pic_update_irq(s) before calling kvm_notify_acked_irq(). Ack callback is the one who calls kvm_pic_set_irq(0) to lower irq line, so when pic_update_irq(s) is called irr still indicates that irq level is 1 and it is delivered yet again. But there is something strange with PIC + ack notifiers. PIC call ACK notifiers not when guest acks the interrupt, but when KVM gets the interrupt for delivery. Calling ack notifiers at this point is wrong. Device assignment ack notifier enables host interrupts, but guest not yet had a chance to clear interrupt condition in a device. Or do I miss something here? > > > > irq_lock is indeed used to synchronization ioapic access, but the way > > > > it is done requires calling kvm_set_irq() with lock held and this adds > > > > unnecessary lock on a msi irq injection path. > > > > > > > > > Using RCU to synchronize ack/mask notifier list walk versus list > > > > > addition is good, but i'm not entirely sure that smaller locks like you > > > > > are proposing is a benefit long term (things become much more tricky). > > > > Without removing irq_lock RCU is useless since the list walking is always > > > > done with irq_lock held. I see smaller locks as simplification BTW. The > > > > thing is tricky now, or so I felt while trying to understand what > > > > irq_lock actually protects. With smaller locks with names that clearly > > > > indicates which data structure it protects I feel the code is much > > > > easy to understand. > > > > > > What is worrying is if you need to take both PIC and IOAPIC locks for > > > some operation (then have to worry about lock ordering etc). > > > > > Lets not worry about far fetched theoretical cases especially since you > > have the same problem now. > > Well the issue is you are defining what can be done in notifier context, > so it does matter. > Of course, just like now we have to define things that can be done in notifier context. It can't sleep for instance since it may be called with interrupts disabled. > > What if you'll need to take both pic lock and > > irq_lock? Except that this is not so theoretical because we need to > > take them both with current code we just don't do it. > > While most accesses to the i8259 are with the kvm mutex taken, the call > to kvm_pic_read_irq() is not. We can't easily take the kvm mutex there > since the function is called with interrupts disabled. > > Convert irq_lock to a spinlock. Or apply my patchset :) Which basically converts irq_lock to spinlock and pushes it into ioapic. > > Fix by adding a spinlock to the virtual interrupt controller. Since > we can't send an IPI under the spinlock (we also take the same spinlock > in an irq disabled context), we defer the IPI until the spinlock is > released. Similarly, we defer irq ack notifications until after spinlock > release to avoid lock recursion. > > Its possible to send IPI's under spinlocks now. > That what I was thinking too. There is not need to all this tricks in PIC now. > Again, i'm not objecting to these changes, but playing devil's advocate. > Just make comparison with current state of affairs fair :) -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html