Re: [PATCH] kvm-all.c: Move init of irqchip_inject_ioctl out of kvm_irqchip_create()

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Tue, 14 Aug 2012 15:27:45 +0200

On 2012-08-14 15:10, Peter Maydell wrote:
> On 14 August 2012 09:09, Jan Kiszka <jan.kiszka@xxxxxx> wrote:
>> On 2012-08-14 09:52, Peter Maydell wrote:
>>> Well, you appear to know what this variant ioctl does and why it's
>>> better than KVM_IRQ_LINE, whereas I don't. I just want to deliver
>>> an interrupt, KVM_IRQ_LINE lets me deliver an interrupt, why
>>> do I need anything more? (What would I do with the status return, for
>>> instance? I have to assert the incoming irq line, there's nothing for
>>> me to do if the kernel says "sorry, can't do that" except abort qemu.)
>>
>> Not sure how timekeeping of all your guests will work, but a classic
>> scenario on x86 is that some timer is programmed to deliver periodic
>> ticks (or one-shot ticks that also generates a virtual periodic timer)
>> and that those ticks will then be used to derive the system time of the
>> guest. Now, if the guest was unable to process the past tick completely
>> (due to host load) and we inject already another tick event, that one
>> will get lost. Some guests (older Linuxes but also many proprietary
>> OSes) are not prepared for such tick loss and will suffer from drifting
>> wall clocks.
> 
> So, for ARM the overwhelmingly common case will be that we use the in
> kernel architectural timer for doing periodic / one-shot ticks. That's
> all in the kernel anyway (both timer and VGIC interrupt controller).
> 
> The less plausible case uses the in-kernel interrupt controller but an
> out of kernel timer device. The only link between the two is a qemu_irq line,
> so we have (a) no way to tell if this interrupt line into the GIC is a timer
> or not and (b) no back-channel to the timer device to report things anyway.

(a) is userspace knowledge (if it implements some timer that guests may
use to drive a clock), so nothing the kernel needs to know. (b) will
likely come for x86 to clean up the current mess we have for
decoalescing RTC periodic timers and possibly adding HPET support.

> 
> The really unlikely case (in that there is theoretically code to handle it
> but in practice QEMU will refuse to configure itself this way) has an out
> of kernel GIC. In this case the interrupts we deliver to the kernel are
> pre-coalesced into a single IRQ line, and there's definitely not a lot
> we could do with the status report.

Agreed. In this case, the userspace GIC would have to provide the
coalescing information.

> 
> Marc Z tells me we could make the kernel return the 'did this coalesce?'
> status without too much difficulty, but unless it's actually possible
> to do something useful with it in userspace I'm not sure I see the point.

>From my point of view (but Avi seems to disagree), it would even make
sense to always return 1 if you cannot actually tell, just to
consolidate over a single IOCTL on the long run. But given that you can,
I would avoid over-simplifying, specifically as the embedded space is
known to add all sorts of special devices for special use cases, and you
cannot know in advance if the architectural (in-kernel) timer will
always be sufficient.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html