Am 17.12.2010 16:25, Thomas Gleixner wrote: > On Fri, 17 Dec 2010, Jan Kiszka wrote: > >> Am 17.12.2010 11:41, Thomas Gleixner wrote: >>> On Fri, 17 Dec 2010, Jan Kiszka wrote: >>>> Am 17.12.2010 11:23, Thomas Gleixner wrote: >>>>> OTOH, if we have to disable anyway, then we could simply keep it >>>>> disabled across the installation of a new handler. That would make the >>>>> notification business go away, wouldn't it ? >>>> >>>> No, the notification is still necessary in case the registered handler >>>> keeps the line off after returning from both hard and threaded handler. >>> >>> And how should that happen? If it is in oneshot mode then the line is >>> reenabled when the thread handler returns. >> >> disable_irq_nosync is called by the handler before returning. And it's >> the handler's job to revert this, properly synchronizing it internally. > > disable_irq_nosync() is really the worst thing to do. That's simply > not going to work without a lot of fuglyness. > > What about the following: > > primary_handler(....) > { > if (!shared) > return IRQ_WAKE_THREAD; > > spin_lock(dev->irq_lock); > > if (from_my_device() || dev->irq_thread_waiting) { > mask_dev(); > dev->masked = true; > ret = IRQ_WAKE_THREAD; > } else > ret = IRQ_NONE; > > spin_unlock(dev->irq_lock); > return ret; > } > > check_timeout() > { > if (dev->irq_active && wait_longer()) > return IRQ_WAKE_THREAD; > return IRQ_HANDLED; > } > > unmask_dev_if_necessary() > { > if (dev->masked && dev->irq_active) > umask_dev(); > } > > threaded_handler(....) > { > if (!dev->irq_thread_waiting) { > spin_lock_irq(dev->irq_lock); > wake_user = do_magic_stuff_with_the_dev(); > dev->irq_thread_waiting = wake_user; > spin_unlock(dev->irq_lock); > if (wake_user) > wake_up(user); > } > > if (!dev->irq_thread_waiting) { > spin_lock_irq(dev->irq_lock); > unmask_dev_if_necessary(); > spin_unlock(dev->irq_lock); > return IRQ_HANDLED; > } > > /* > * Wait for user space to complete. Timeout is to > * avoid starvation of the irq line when > * something goes wrong > */ > wait_for_completion_timeout(dev->compl, SENSIBLE_TIMEOUT); > > spin_lock_irq(dev->irq_lock); > if (timedout) { > mask_dev(); > dev->masked = true; > /* > * Leave dev->irq_thread_waiting untouched and let > * the core code reschedule us when check_timeout > * decides it's worth to wait. In any case we leave > * the device masked at the device level, so we don't > * cause an interrupt storm. > */ > ret = check_timeout(); > } else { > unmask_dev_if_necessary(); > dev->irq_thread_waiting = false; > ret = IRQ_HANDLED; > } > spin_unlock(dev->irq_lock); > return ret; > } > > userspace_complete() > { > complete(dev->irq_compl); > } > > Your aproach with disable_irq_nosync() is completely flawed, simply > because you try to pretend that your interrupt handler is done, while > it is not done at all. The threaded interrupt handler is done when > user space completes. Everything else is just hacking around the > problem and creates all that nasty transitional problems. disable_irq_nosync is the pattern currently used in KVM, it's nothing new in fact. The approach looks interesting but requires separate code for non-PCI-2.3 devices, i.e. when we have no means to mask at device level. Further drawbacks - unless I missed something on first glance: - prevents any future optimizations that would work without IRQ thread ping-pong (ie. once we allow guest IRQ injection from hardirq context for selected but typical setups) - two additional, though light-weight, context switches on each interrupt completion - continuous polling if user space decides to leave the interrupt unhandled (e.g. because the virtual IRQ line is masked) Maybe the latter can be solved in a nicer way, but I don't think we can avoid the first two. I'm not saying yet that they are killing this approach, we just need to asses their relevance. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html