[PATCH V9 1/3] irq: Allow to pass the IRQF_TIMER flag with percpu irq request

marc.zyngier@xxxxxxx (Marc Zyngier) · Tue, 25 Apr 2017 14:22:04 +0100

On 25/04/17 13:51, Daniel Lezcano wrote:
> On Tue, Apr 25, 2017 at 11:21:21AM +0100, Marc Zyngier wrote:
>> On 25/04/17 10:49, Daniel Lezcano wrote:
>>> On Tue, Apr 25, 2017 at 10:10:12AM +0100, Marc Zyngier wrote:
>>
>> [...]
>>
>>>>> +static inline void setup_timings(struct irq_desc *desc, struct irqaction *act)
>>>>> +{
>>>>> +	/*
>>>>> +	 * We don't need the measurement because the idle code already
>>>>> +	 * knows the next expiry event.
>>>>> +	 */
>>>>> +	if (act->flags & __IRQF_TIMER)
>>>>> +		return;
>>>>
>>>> And that's where this is really wrong for the KVM guest timer. As I
>>>> said, this timer is under complete control of the guest, and the rest of
>>>> the system doesn't know about it. KVM itself will only find out when the
>>>> vcpu does a VM exit for a reason or another, and will just save/restore
>>>> the state in order to be able to give the timer to another guest.
>>>>
>>>> The idle code is very much *not* aware of anything concerning that guest
>>>> timer.
>>>
>>> Just for my own curiosity, if there are two VM (VM1 and VM2). VM1 sets a timer1
>>> at <time> and exits, VM2 runs and sets a timer2 at <time+delta>.
>>>
>>> The timer1 for VM1 is supposed to expire while VM2 is running. IIUC the virtual
>>> timer is under control of VM2 and will expire at <time+delta>.
>>>
>>> Is the host wake up with the SW timer and switch in VM1 which in turn restores
>>> the timer and jump in the virtual timer irq handler?
>>
>> Indeed. The SW timer causes VM1 to wake-up, either on the same CPU
>> (preempting VM2) or on another. The timer is then restored with the
>> pending virtual interrupt injected, and the guest does what it has to
>> with it.
> 
> Thanks for clarification.
> 
> So there is a virtual timer with real registers / interruption (waking up the
> host) for the running VMs and SW timers for non-running VMs.
> 
> What is the benefit of having such mechanism instead of real timers injecting
> interrupts in the VM without the virtual timer + save/restore? Efficiency in
> the running VMs when setting up timers (saving privileges change overhead)?

You can't dedicate HW resources to virtual CPUs. It just doesn't scale.
Also, injecting HW interrupts in a guest is pretty hard work, and for
multiple reasons:
  - the host needs to be in control of interrupt delivery (don't hog the
CPU with guest interrupts)
  - you want to be able to remap interrupts (id X on the host becomes id
Y on the guest),
  - you want to deal with migrating vcpus,
  - you want deliver an interrupt to a vcpu that is *not* running.

It *is* doable, but it is not cheap at all from a HW point of view.

	M.
-- 
Jazz is not dead. It just smells funny...