On Tue, Nov 02, 2010 at 08:11:31PM +0100, Jan Kiszka wrote: > Am 02.11.2010 19:52, Michael S. Tsirkin wrote: > > On Tue, Nov 02, 2010 at 07:40:46PM +0100, Jan Kiszka wrote: > >> Am 02.11.2010 19:24, Michael S. Tsirkin wrote: > >>> On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote: > >>>>>> dev->host_irq_disabled = false; > >>>>>> } > >>>>>> - spin_unlock(&dev->intx_lock); > >>>>>> +out: > >>>>>> + spin_unlock_irq(&dev->intx_lock); > >>>>>> + > >>>>>> + if (reassert) > >>>>>> + kvm_set_irq(dev->kvm, dev->irq_source_id, dev->guest_irq, 1); > >>>>> > >>>>> Hmm, I think this still has more overhead than it needs to have. > >>>>> Instead of setting level to 0 and then back to 1, can't we just > >>>>> avoid set to 1 in the first place? This would need a different > >>>>> interface than pci_2_3_irq_check_and_unmask to avoid a race > >>>>> where interrupt is received while we are acking another one: > >>>>> > >>>>> block userspace access > >>>>> check pending bit > >>>>> if (!pending) > >>>>> set irq (0) > >>>>> clear pending > >>>>> block userspace access > >>>>> > >>>>> Would be worth it for high volume devices. > >>>> > >>>> The problem is that we can't reorder guest IRQ line clearing and host > >>>> IRQ line enabling without taking a lock across host IRQ disable + guest > >>>> IRQ raise - and that is now distributed across hard and threaded IRQ > >>>> handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq. > >>> > >>> Oh I think I confused you. > >>> What I mean is: > >>> > >>> block userspace access > >>> check interrupt status bit > >>> if (!interrupt status bit set) > >>> set irq (0) > >>> clear interrupt disable bit > >>> block userspace access > >>> > >>> This way we enable interrupt after set irq so not need for > >>> extra locks I think. > >> > >> OK. Would require some serious refactoring again. > > > > That would also mean we can't just solve the nested block/unblock > > problem with a simple lock. Not sure this is worth the effort. > > Can't follow: what can be nested and how? I just mean interrupt trying to block userspace when thread did that already. > > > >> But what about edge IRQs? Don't we need to toggle the bit for them? And > >> as we do not differentiate between level and edge, we currently have to > >> do this unconditionally. > > > > AFAIK PCI IRQs are level, so I don't think we need to bother. > > Ah, indeed. > > > > >>> > >>> Hmm one thing I noticed is that pci_block_user_cfg_access > >>> will BUG_ON if it was already blocked. So I think we have > >>> a bug here when interrupt handler kicks in right after > >>> we unmask interrupts. > >>> > >>> Probably need some kind of lock to protect against this. > >>> > >> > >> Or an atomic counter. > > > > BTW block userspace access uses a global spinlock which will likely hurt > > us on multi-CPU. Switching that to something more SMP friendly, e.g. a > > per-device spinlock, might be a good idea: I don't see why that lock and > > queue are global. > > Been through that code recently, hairy stuff. pci_lock also protects the > bus operation which can be overloaded (e.g. for error injection - if > that is not the only use case). So we need a per-bus lock, but that can > still cause contentions if devices on the same bus are handled on > different cpus. Right. So this lock got reused for blocking userspace, I do not suggest we rip it all out, just make userspace blocking use a finer-grained lock. > I think the whole PCI config interface is simply not designed for > performance. It's considered a slow path, which it normally is. So I guess we'll need to fix that now, at least if we want to make the 2.3 way the default. > > > >> Will have a look. > > > > Need to also consider an interrupt running in parallel > > with unmasking in thread. > > > >> Alex, does VFIO take care of this already? > > > > VFIO does this all under spin_lock_irq. > > Everything has its pros and cons... > > Jan > > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html