Re: [PATCH 0/5]: Fix kdump under KVM

Chris Lalancette <clalance@xxxxxxxxxx> · Thu, 29 Oct 2009 09:34:46 +0100

Avi Kivity wrote:
> On 10/28/2009 12:13 PM, Chris Lalancette wrote:
>>> The kick from i8254 code is pretty bad, as you mention.  I forget why it
>>> is needed at all - shouldn't kvm_set_irq() end up kicking the correct
>>>      
>> As I understand it, that's not quite how it works.  From what I can see, what
>> happens is that the i8254 is programmed as an hrtimer.  When the hrtimer
>> expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code).
>> That code is running in interrupt context, however, so you can't directly call
>> "set_irq" at that point.  Instead, we update the "pending" variable and defer
>> work until later on.  That "later on" is when we are doing a vcpu_run, at which
>> point we check the "pending" variable, and if set, inject the interrupt.
>>
>> The problem is that if the vcpu(s) are in idle when the hrtimer expires, and we
>> don't kick them, no vcpu will wake up, and hence none of them will ever run
>> "set_irq" to get it injected into the guest.
>>    
> 
> Oh yes, I remember now.
> 
>> If you have other ideas on how we might accomplish this, I'd definitely be
>> interested in hearing them.
>>    
> 
> We could schedule work to run in thread context.  Previous to the user 
> return notifier work, this would cause a reloading of a bunch of msrs 
> and would thus be quite expensive, but now switching to kernel threads 
> and back is pretty cheap.
> 
> So we could try schedule_work().  My only worry is that it uses 
> wake_up() instead of wake_up_sync(), so it could introduce delay.  I'd 
> much prefer a variant the schedules immediately.
> 
> An alternative is to make irq injection work from irq context.  It's not 
> difficult to do technically - convert mutexes to spin_lock_irqsave() - 
> I'm just worried about long irqoff times with large guests (ioapic needs 
> to iterate over all vcpus sometimes).  This is useful for other cases - 
> device assignment interrupt injection, irqfd for vhost/vbus.  So maybe 
> we should go this path, and worry about reducing irqoff times later when 
> we bump KVM_MAX_VCPUS.
> 

I'm starting to take a look at this.  While this may be a generically useful
cleanup, it occurs to me that even if we call "set_irq" from the hrtimer
callback, we still have to do the "kick_vcpus" type thing.  Otherwise, we still
have the problem where if all vcpus are idle, none of them will be doing
vcpu_run anytime soon to actually inject the interrupt into the guest.

Or did I mis-understand what you are proposing?

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html