On Tue, Jun 26, 2012 at 9:21 AM, Marc Zyngier <marc.zyngier at arm.com> wrote: > On 25/06/12 22:15, Christoffer Dall wrote: >> On Mon, May 14, 2012 at 9:07 AM, Marc Zyngier <marc.zyngier at arm.com> wrote: >>> We can inject a timer interrupt into the guest as a result of >>> three possible events: >>> - The virtual timer interrupt has fired while we were still >>> ?executing the guest >> >> couldn't/shouldn't we wire this to the vGIC support through the HW bit >> in the LRs so it gets delivered directly to the guest without our >> intervention? > > You really don't want to go down that road. > > For that to work properly, you'd need to start adding priorities to the > host GIC code, so we can still preempt a host running in its interrupt > handler (and prevent it from DoS-ing the host). > > And introducing priorities in the interrupt handling is a sure way to > start a nice flame war on LKML... ;-) > ok, I obviously didn't know what I was asking here ;) >> that would take care of all the arch_timer_virt_external_handler stuff >> as well... > > No. Because you have no HW distributor at the guest level, you'd still > have to take the interrupt at the host level, notice this is a guest > related interrupt, escape the normal IRQ flow, queue it into the guest > LRs. The only thing you gain is the automatic EOI from the guest to the > HW. All the rest is just pain. > >>> - The timer interrupt hasn't fired, but it expired while we >>> ?were doing the world switch >>> - A hrtimer we programmed earlier has fired >>> >>> Signed-off-by: Marc Zyngier <marc.zyngier at arm.com> >>> --- >>> ?arch/arm/include/asm/kvm_arch_timer.h | ? 34 +++++++- >>> ?arch/arm/kvm/timer.c ? ? ? ? ? ? ? ? ?| ?158 +++++++++++++++++++++++++++++++++ >>> ?2 files changed, 191 insertions(+), 1 deletions(-) >>> ?create mode 100644 arch/arm/kvm/timer.c >>> >>> diff --git a/arch/arm/include/asm/kvm_arch_timer.h b/arch/arm/include/asm/kvm_arch_timer.h >>> index 491e336..191c541 100644 >>> --- a/arch/arm/include/asm/kvm_arch_timer.h >>> +++ b/arch/arm/include/asm/kvm_arch_timer.h >>> @@ -1,13 +1,45 @@ >>> ?#ifndef __ASM_ARM_KVM_ARCH_TIMER_H >>> ?#define __ASM_ARM_KVM_ARCH_TIMER_H >>> >>> +#include <linux/clocksource.h> >>> +#include <linux/hrtimer.h> >>> +#include <linux/workqueue.h> >>> + >>> ?struct arch_timer_kvm { >>> +#ifdef CONFIG_KVM_ARM_TIMER >>> + ? ? ? bool ? ? ? ? ? ? ? ? ? ?enabled; >>> + ? ? ? union { >>> + ? ? ? ? ? ? ? cycle_t ? ? ? ? cntvoff; >>> + ? ? ? ? ? ? ? struct { >>> + ? ? ? ? ? ? ? ? ? ? ? u32 ? ? cntvoff_high; /* Restored only */ >>> + ? ? ? ? ? ? ? ? ? ? ? u32 ? ? cntvoff_low; ?/* Restored only */ >>> + ? ? ? ? ? ? ? } cntvoff32; >>> + ? ? ? }; >>> +#endif >>> ?}; >>> >>> ?struct arch_timer_cpu { >>> +#ifdef CONFIG_KVM_ARM_TIMER >>> + ? ? ? struct hrtimer ? ? ? ? ?timer; >>> + ? ? ? struct work_struct ? ? ?expired; >>> + ? ? ? cycle_t ? ? ? ? ? ? ? ? cval; >>> + ? ? ? bool ? ? ? ? ? ? ? ? ? ?armed; >>> + >>> + ? ? ? /* Registers */ >>> + ? ? ? u32 ? ? ? ? ? ? ? ? ? ? cntv_ctl; ? ? ? /* Saved/restored */ >>> + ? ? ? u32 ? ? ? ? ? ? ? ? ? ? cntv_cval_high; /* Saved/restored */ >>> + ? ? ? u32 ? ? ? ? ? ? ? ? ? ? cntv_cval_low; ?/* Saved/restored */ >>> +#endif >>> ?}; >> >> Could you document these structures a little? >> >> It's hard to tell why something goes on the per-vm struct and why >> something goes on the per-vcpu struct if you're not super familier >> with the generic timer architecture. For example, it's not clear if >> the enabled field in the arch_timer_kvm struct corresponds to some bit >> in hardware set by the guest or if it says if we use arch timers in >> the guest. I'm pretty sure it means that we initialized everything >> correctly, we use the vgic, and we can therefore use the features for >> guests... > > Fair enough. I'll add some documentation to that effect. > >> also took me annoyingly long time to figure out that 'armed' means >> hrtimer armed outside of guest, not related to guest setting a >> timer... > > Well, it is related to the guest setting a timer. It acts as a proxy > timer for the guest. > >>> diff --git a/arch/arm/kvm/timer.c b/arch/arm/kvm/timer.c >>> new file mode 100644 >>> index 0000000..a342f37 >>> --- /dev/null >>> +++ b/arch/arm/kvm/timer.c >>> @@ -0,0 +1,158 @@ >>> +/* >>> + * Copyright (C) 2012 ARM Ltd. >>> + * Author: Marc Zyngier <marc.zyngier at arm.com> >>> + * >>> + * This program is free software; you can redistribute it and/or modify >>> + * it under the terms of the GNU General Public License version 2 as >>> + * published by the Free Software Foundation. >>> + * >>> + * This program is distributed in the hope that it will be useful, >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ?See the >>> + * GNU General Public License for more details. >>> + * >>> + * You should have received a copy of the GNU General Public License >>> + * along with this program; if not, write to the Free Software >>> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA >>> + */ >>> + >>> +#include <linux/kvm.h> >>> +#include <linux/kvm_host.h> >>> +#include <linux/interrupt.h> >>> + >>> +#include <asm/arch_timer.h> >>> + >>> +#include <asm/kvm_vgic.h> >>> +#include <asm/kvm_arch_timer.h> >>> + >>> +static const struct kvm_irq_level virt_timer_ppi = { >>> + ? ? ? .irq ? ?= 27, ? /* A7/A15 specific */ >> >> so should this come from a DT somehow or should we check which CPU >> we'er emulating (same as host) at runtime? > > This is completely CPU specific, and should be provided by looking at > what we're actually emulating/running on. > ok, I'm working on patches for this, which determines this at runtime, so we'll have to figure out a nice clean way to decide these things at runtime. >>> + ? ? ? .level ?= 1, >>> +}; >>> + >>> +static struct timecounter *timecounter; >>> +static struct workqueue_struct *wqueue; >> >> why do we need our own workqueue here? (as opposed to just scheduling >> work on 'events') > > It makes things slightly more predictable, and easier to instrument. x86 > does the same. > ok >>> + >>> +static cycle_t kvm_phys_timer_read(void) >>> +{ >>> + ? ? ? return timecounter->cc->read(timecounter->cc); >>> +} >>> + >>> +static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu) >>> +{ >>> + ? ? ? struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu; >>> + >>> + ? ? ? timer->cntv_ctl |= 1 << 1; /* Mask the interrupt in the guest */ >>> + ? ? ? kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id, &virt_timer_ppi); >>> +} >>> + >>> +static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id) >>> +{ >>> + ? ? ? struct kvm_vcpu *vcpu = kvm_vgic_get_current_vcpu(); >>> + >>> + ? ? ? if (WARN_ON(!vcpu)) >>> + ? ? ? ? ? ? ? return IRQ_NONE; >>> + >>> + ? ? ? kvm_timer_inject_irq(vcpu); >>> + ? ? ? return IRQ_HANDLED; >>> +} >>> + >>> +static void kvm_timer_inject_irq_work(struct work_struct *work) >>> +{ >>> + ? ? ? struct kvm_vcpu *vcpu; >>> + >>> + ? ? ? vcpu = container_of(work, struct kvm_vcpu, arch.timer_cpu.expired); >>> + ? ? ? vcpu->arch.timer_cpu.armed = false; >>> + ? ? ? kvm_timer_inject_irq(vcpu); >>> +} >>> + >>> +static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt) >>> +{ >>> + ? ? ? struct arch_timer_cpu *timer; >>> + ? ? ? timer = container_of(hrt, struct arch_timer_cpu, timer); >>> + ? ? ? queue_work(wqueue, &timer->expired); >>> + ? ? ? return HRTIMER_NORESTART; >>> +} >>> + >>> +void kvm_timer_sync_to_cpu(struct kvm_vcpu *vcpu) >>> +{ >>> +} >>> + >>> +void kvm_timer_sync_from_cpu(struct kvm_vcpu *vcpu) >>> +{ >>> + ? ? ? struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu; >>> + ? ? ? cycle_t cval, now; >>> + ? ? ? u64 ns; >>> + >>> + ? ? ? /* Check if the timer is enabled and unmasked first */ >>> + ? ? ? if ((timer->cntv_ctl & 3) != 1) >>> + ? ? ? ? ? ? ? return; >>> + >>> + ? ? ? cval = ((cycle_t)timer->cntv_cval_high << 32) | timer->cntv_cval_low; >>> + ? ? ? now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff; >> >> when is this adjusted and according to what? > > cntvoff is never adjusted. It is sampled exactly once when the VM is > created, and that's about it. And actually, the feature is disabled for > the time being. > ok, so the point is simply to say reasonably to the guest it's been running for X time, not counting time when the guest was not scheduled? >>> + >>> + ? ? ? if (timer->armed) { >>> + ? ? ? ? ? ? ? if (cval == timer->cval) >>> + ? ? ? ? ? ? ? ? ? ? ? return; ? ? ? ? /* Already programmed */ >>> + ? ? ? ? ? ? ? hrtimer_cancel(&timer->timer); >>> + ? ? ? ? ? ? ? cancel_work_sync(&timer->expired); >>> + ? ? ? ? ? ? ? timer->armed = false; >>> + ? ? ? } >>> + >>> + ? ? ? if (cval <= now) { >>> + ? ? ? ? ? ? ? /* >>> + ? ? ? ? ? ? ? ?* Timer has already expired while we were not >>> + ? ? ? ? ? ? ? ?* looking. Inject the interrupt and carry on. >>> + ? ? ? ? ? ? ? ?*/ >>> + ? ? ? ? ? ? ? kvm_timer_inject_irq(vcpu); >>> + ? ? ? ? ? ? ? return; >>> + ? ? ? } >>> + >>> + ? ? ? timer->cval = cval; >>> + ? ? ? timer->armed = true; >>> + ? ? ? ns = cyclecounter_cyc2ns(timecounter->cc, cval - now); >>> + ? ? ? hrtimer_start(&timer->timer, ktime_add_ns(ktime_get(), ns), >>> + ? ? ? ? ? ? ? ? ? ? HRTIMER_MODE_ABS); >>> +} >>> + >>> +void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu) >>> +{ >>> + ? ? ? struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu; >>> + >>> + ? ? ? INIT_WORK(&timer->expired, kvm_timer_inject_irq_work); >>> + ? ? ? hrtimer_init(&timer->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); >>> + ? ? ? timer->timer.function = kvm_timer_expire; >>> +} >>> + >>> +int kvm_timer_hyp_init(void) >>> +{ >>> + ? ? ? timecounter = arch_timer_get_timecounter(); >>> + ? ? ? if (!timecounter) >>> + ? ? ? ? ? ? ? return -ENODEV; >>> + >>> + ? ? ? wqueue = create_singlethread_workqueue("kvm_arch_timer"); >>> + ? ? ? if (!wqueue) >>> + ? ? ? ? ? ? ? return -ENOMEM; >>> + >>> + ? ? ? arch_timer_switch_to_phys(kvm_arch_timer_handler); >>> + ? ? ? return 0; >>> +} >>> + >>> +void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu) >>> +{ >>> + ? ? ? struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu; >>> + >>> + ? ? ? hrtimer_cancel(&timer->timer); >>> + ? ? ? cancel_work_sync(&timer->expired); >>> +} >>> + >>> +int kvm_timer_init(struct kvm *kvm) >>> +{ >>> +#if 0 >>> + ? ? ? kvm->arch.timer.cntvoff = kvm_phys_timer_read(); >>> +#endif >> >> what's this? > > Workaround for a bug in the model, and a new version should be released > shortly. HW doesn't have this issue, fortunately. Once the new model is > in the wild, we can enable this and get proper virtual offset. > ok -Christoffer