Avi Kivity wrote: > On 10/28/2009 12:13 PM, Chris Lalancette wrote: >>> The kick from i8254 code is pretty bad, as you mention. I forget why it >>> is needed at all - shouldn't kvm_set_irq() end up kicking the correct >>> >> As I understand it, that's not quite how it works. From what I can see, what >> happens is that the i8254 is programmed as an hrtimer. When the hrtimer >> expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code). >> That code is running in interrupt context, however, so you can't directly call >> "set_irq" at that point. Instead, we update the "pending" variable and defer >> work until later on. That "later on" is when we are doing a vcpu_run, at which >> point we check the "pending" variable, and if set, inject the interrupt. >> >> The problem is that if the vcpu(s) are in idle when the hrtimer expires, and we >> don't kick them, no vcpu will wake up, and hence none of them will ever run >> "set_irq" to get it injected into the guest. >> > > Oh yes, I remember now. > >> If you have other ideas on how we might accomplish this, I'd definitely be >> interested in hearing them. >> > > We could schedule work to run in thread context. Previous to the user > return notifier work, this would cause a reloading of a bunch of msrs > and would thus be quite expensive, but now switching to kernel threads > and back is pretty cheap. > > So we could try schedule_work(). My only worry is that it uses > wake_up() instead of wake_up_sync(), so it could introduce delay. I'd > much prefer a variant the schedules immediately. > > An alternative is to make irq injection work from irq context. It's not > difficult to do technically - convert mutexes to spin_lock_irqsave() - > I'm just worried about long irqoff times with large guests (ioapic needs > to iterate over all vcpus sometimes). This is useful for other cases - > device assignment interrupt injection, irqfd for vhost/vbus. So maybe > we should go this path, and worry about reducing irqoff times later when > we bump KVM_MAX_VCPUS. > I'm starting to take a look at this. While this may be a generically useful cleanup, it occurs to me that even if we call "set_irq" from the hrtimer callback, we still have to do the "kick_vcpus" type thing. Otherwise, we still have the problem where if all vcpus are idle, none of them will be doing vcpu_run anytime soon to actually inject the interrupt into the guest. Or did I mis-understand what you are proposing? -- Chris Lalancette -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html