Re: [PATCH 5/5] KVM: PPC: booke: Improve timer register emulation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07.09.2011, at 12:16, Liu Yu-B13201 wrote:

> 
> 
>> -----Original Message-----
>> From: kvm-ppc-owner@xxxxxxxxxxxxxxx 
>> [mailto:kvm-ppc-owner@xxxxxxxxxxxxxxx] On Behalf Of Alexander Graf
>> Sent: Wednesday, September 07, 2011 3:06 AM
>> To: Liu Yu-B13201
>> Cc: Wood Scott-B07421; kvm-ppc@xxxxxxxxxxxxxxx
>> Subject: Re: [PATCH 5/5] KVM: PPC: booke: Improve timer 
>> register emulation
>> 
>> 
>> 
>> Am 06.09.2011 um 10:04 schrieb Alexander Graf <agraf@xxxxxxx>:
>> 
>>> 
>>> On 06.09.2011, at 05:17, Liu Yu-B13201 wrote:
>>> 
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: kvm-ppc-owner@xxxxxxxxxxxxxxx 
>>>>> [mailto:kvm-ppc-owner@xxxxxxxxxxxxxxx] On Behalf Of Alexander Graf
>>>>> Sent: Tuesday, September 06, 2011 6:45 AM
>>>>> To: Wood Scott-B07421
>>>>> Cc: kvm-ppc@xxxxxxxxxxxxxxx
>>>>> Subject: Re: [PATCH 5/5] KVM: PPC: booke: Improve timer 
>>>>> register emulation
>>>>> 
>>>>> 
>>>>> On 27.08.2011, at 01:31, Scott Wood wrote:
>>>>> 
>>>>>> From: Liu Yu <yu.liu@xxxxxxxxxxxxx>
>>>>>> 
>>>>>> Decrementers are now properly driven by TCR/TSR, and the guest
>>>>>> has full read/write access to these registers.
>>>>>> 
>>>>>> The decrementer keeps ticking (and setting the TSR bit) 
>>>>> regardless of
>>>>>> whether the interrupts are enabled with TCR.
>>>>>> 
>>>>>> The decrementer stops at zero, rather than going negative.
>>>>>> 
>>>>>> Signed-off-by: Liu Yu <yu.liu@xxxxxxxxxxxxx>
>>>>>> [scott: added dequeue in kvmppc_booke_irqprio_deliver, and 
>>>>> dec stop-at-zero]
>>>>>> Signed-off-by: Scott Wood <scottwood@xxxxxxxxxxxxx>
>>>>>> ---
>>>>>> arch/powerpc/include/asm/kvm_host.h |    2 +-
>>>>>> arch/powerpc/include/asm/kvm_ppc.h  |   11 +++++
>>>>>> arch/powerpc/kvm/book3s.c           |    8 ++++
>>>>>> arch/powerpc/kvm/booke.c            |   80 
>>>>> +++++++++++++++++++++++++++--------
>>>>>> arch/powerpc/kvm/booke.h            |    4 ++
>>>>>> arch/powerpc/kvm/booke_emulate.c    |   11 ++++-
>>>>>> arch/powerpc/kvm/emulate.c          |   45 ++++++++-----------
>>>>>> arch/powerpc/kvm/powerpc.c          |   20 +--------
>>>>>> 8 files changed, 114 insertions(+), 67 deletions(-)
>>>>>> 
>>>>>> diff --git a/arch/powerpc/include/asm/kvm_host.h 
>>>>> b/arch/powerpc/include/asm/kvm_host.h
>>>>>> index 3305af4..ea08c79 100644
>>>>>> --- a/arch/powerpc/include/asm/kvm_host.h
>>>>>> +++ b/arch/powerpc/include/asm/kvm_host.h
>>>>>> @@ -334,7 +334,7 @@ struct kvm_vcpu_arch {
>>>>>>  u32 tbl;
>>>>>>  u32 tbu;
>>>>>>  u32 tcr;
>>>>>> -    u32 tsr;
>>>>>> +    ulong tsr; /* we need to perform set/clr_bits() which 
>>>>> requires ulong */
>>>>>>  u32 ivor[64];
>>>>>>  ulong ivpr;
>>>>>>  u32 pvr;
>>>>>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
>>>>> b/arch/powerpc/include/asm/kvm_ppc.h
>>>>>> index bdaa6c8..ddaa615 100644
>>>>>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>>>>>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>>>>>> @@ -66,6 +66,7 @@ extern int 
>>>>> kvmppc_emulate_instruction(struct kvm_run *run,
>>>>>> extern int kvmppc_emulate_mmio(struct kvm_run *run, struct 
>>>>> kvm_vcpu *vcpu);
>>>>>> extern void kvmppc_emulate_dec(struct kvm_vcpu *vcpu);
>>>>>> extern u32 kvmppc_get_dec(struct kvm_vcpu *vcpu, u64 tb);
>>>>>> +extern void kvmppc_decrementer_func(unsigned long data);
>>>>>> 
>>>>>> /* Core-specific hooks */
>>>>>> 
>>>>>> @@ -197,4 +198,14 @@ int kvm_vcpu_ioctl_config_tlb(struct 
>>>>> kvm_vcpu *vcpu,
>>>>>> int kvm_vcpu_ioctl_dirty_tlb(struct kvm_vcpu *vcpu,
>>>>>>               struct kvm_dirty_tlb *cfg);
>>>>>> 
>>>>>> +static inline void kvmppc_wakeup_vcpu(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +    if (waitqueue_active(&vcpu->wq)) {
>>>>>> +        wake_up_interruptible(&vcpu->wq);
>>>>>> +        vcpu->stat.halt_wakeup++;
>>>>>> +    } else if (vcpu->cpu != -1) {
>>>>>> +        smp_send_reschedule(vcpu->cpu);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> #endif /* __POWERPC_KVM_PPC_H__ */
>>>>>> diff --git a/arch/powerpc/kvm/book3s.c 
>> b/arch/powerpc/kvm/book3s.c
>>>>>> index f68a34d..b057856 100644
>>>>>> --- a/arch/powerpc/kvm/book3s.c
>>>>>> +++ b/arch/powerpc/kvm/book3s.c
>>>>>> @@ -514,3 +514,11 @@ out:
>>>>>>  mutex_unlock(&kvm->slots_lock);
>>>>>>  return r;
>>>>>> }
>>>>>> +
>>>>>> +void kvmppc_decrementer_func(unsigned long data)
>>>>>> +{
>>>>>> +    struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
>>>>>> +
>>>>>> +    kvmppc_core_queue_dec(vcpu);
>>>>>> +    kvmppc_wakeup_vcpu(vcpu);
>>>>>> +}
>>>>>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>>>>>> index 0ed62c1..502f9ff 100644
>>>>>> --- a/arch/powerpc/kvm/booke.c
>>>>>> +++ b/arch/powerpc/kvm/booke.c
>>>>>> @@ -205,7 +205,8 @@ void 
>>>>> kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu,
>>>>>> static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
>>>>>>                                     unsigned int priority)
>>>>>> {
>>>>>> -    int allowed = 0;
>>>>>> +    int allowed = 1;
>>>>>> +    int dequeue = 0;
>>>>>>  ulong uninitialized_var(msr_mask);
>>>>>>  bool update_esr = false, update_dear = false;
>>>>>>  ulong crit_raw = vcpu->arch.shared->critical;
>>>>>> @@ -258,10 +259,15 @@ static int 
>>>>> kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
>>>>>>      allowed = vcpu->arch.shared->msr & MSR_ME;
>>>>>>      msr_mask = 0;
>>>>>>      break;
>>>>>> -    case BOOKE_IRQPRIO_EXTERNAL:
>>>>>>  case BOOKE_IRQPRIO_DECREMENTER:
>>>>>> +        if (!(vcpu->arch.tcr & TCR_DIE)) {
>>>>>> +            allowed = 0;
>>>>>> +            dequeue = 1;
>>>>>> +        }
>>>>>> +        /* fall through */
>>>>>> +    case BOOKE_IRQPRIO_EXTERNAL:
>>>>>>  case BOOKE_IRQPRIO_FIT:
>>>>>> -        allowed = vcpu->arch.shared->msr & MSR_EE;
>>>>>> +        allowed = allowed && vcpu->arch.shared->msr & MSR_EE;
>>>>>>      allowed = allowed && !crit;
>>>>>>      msr_mask = MSR_CE|MSR_ME|MSR_DE;
>>>>>>      break;
>>>>>> @@ -269,6 +275,8 @@ static int 
>>>>> kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
>>>>>>      allowed = vcpu->arch.shared->msr & MSR_DE;
>>>>>>      msr_mask = MSR_ME;
>>>>>>      break;
>>>>>> +    default:
>>>>>> +        allowed = 0;
>>>>>>  }
>>>>>> 
>>>>>>  if (allowed) {
>>>>>> @@ -283,6 +291,9 @@ static int 
>>>>> kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
>>>>>> 
>>>>>>      if (!keep_irq)
>>>>>>          clear_bit(priority, 
>>>>> &vcpu->arch.pending_exceptions);
>>>>>> +    } else if (dequeue) {
>>>>>> +        /* Should only happen due to races with masking */
>>>>>> +        clear_bit(priority, &vcpu->arch.pending_exceptions);
>>>>>>  }
>>>>>> 
>>>>>>  return allowed;
>>>>>> @@ -721,25 +732,16 @@ static int set_sregs_base(struct 
>>>>> kvm_vcpu *vcpu,
>>>>>>  vcpu->arch.shared->esr = sregs->u.e.esr;
>>>>>>  vcpu->arch.shared->dar = sregs->u.e.dear;
>>>>>>  vcpu->arch.vrsave = sregs->u.e.vrsave;
>>>>>> -    vcpu->arch.tcr = sregs->u.e.tcr;
>>>>>> +    kvmppc_set_tcr(vcpu, sregs->u.e.tcr);
>>>>>> 
>>>>>> -    if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_DEC)
>>>>>> +    if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_DEC) {
>>>>>>      vcpu->arch.dec = sregs->u.e.dec;
>>>>>> -
>>>>>> -    kvmppc_emulate_dec(vcpu);
>>>>>> +        kvmppc_emulate_dec(vcpu);
>>>>>> +    }
>>>>>> 
>>>>>>  if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) {
>>>>>> -        /*
>>>>>> -         * FIXME: existing KVM timer handling is incomplete.
>>>>>> -         * TSR cannot be read by the guest, and its value in
>>>>>> -         * vcpu->arch is always zero.  For now, just handle
>>>>>> -         * the case where the caller is trying to inject a
>>>>>> -         * decrementer interrupt.
>>>>>> -         */
>>>>>> -
>>>>>> -        if ((sregs->u.e.tsr & TSR_DIS) &&
>>>>>> -            (vcpu->arch.tcr & TCR_DIE))
>>>>>> -            kvmppc_core_queue_dec(vcpu);
>>>>>> +        kvmppc_clr_tsr_bits(vcpu, ~0);
>>>>>> +        kvmppc_set_tsr_bits(vcpu, sregs->u.e.tsr);
>>>>>>  }
>>>>>> 
>>>>>>  return 0;
>>>>>> @@ -895,6 +897,48 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
>>>>>> {
>>>>>> }
>>>>>> 
>>>>>> +void kvmppc_set_tcr(struct kvm_vcpu *vcpu, u32 new_tcr)
>>>>>> +{
>>>>>> +    vcpu->arch.tcr = new_tcr;
>>>>>> +    smp_wmb();
>>>>>> +
>>>>>> +    /*
>>>>>> +     * Since TCR changed, we need to check
>>>>>> +     * if blocked interrupts are deliverable.
>>>>>> +     */
>>>>>> +    if ((new_tcr & TCR_DIE) && (vcpu->arch.tsr & TSR_DIS)) {
>>>>>> +        kvmppc_core_queue_dec(vcpu);
>>>>>> +        kvmppc_wakeup_vcpu(vcpu);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +void kvmppc_set_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits)
>>>>>> +{
>>>>>> +    set_bits(tsr_bits, &vcpu->arch.tsr);
>>>>> 
>>>>> I'm still missing the point why you need set_bits/clear_bits. 
>>>>> Can't you just use |= and &= ~ like everyone else? :)
>>>>> 
>>>> 
>>>> Because set_bits/clear_bits is atomic.
>>>> set/clr_tsr_bits() may be called in different threads so 
>> that on different cores.
>>>> We need to make sure modification to be atomic to prevent 
>> missing changes.
>>> 
>>> Ah, so you can set the TSR bit in the decrementer timer 
>> callback? Hrm. Please add a comment there saying so. All the 
>> other cases should be under vcpu_load() scope. If they're 
>> not, we have bugs in our code :). So they shouldn't need 
>> atomic operations (except for now they do because there's one 
>> user that doesn't do vcpu_load).
>> 
>> Thinking about this for a bit more, shouldn't we be able to 
>> share the TSR with the guest so that it can clear the DEC 
>> interrupt without exiting the guest? Then we can't do ulong 
>> because that would make the pv abi 32/64 specific.
>> 
>> Overall, I'm also not very sure it's too useful to set TSR 
>> from non-vcpu context. After all, in most cases the guest 
>> should have interrupts enabled at that point and probably 
>> wants to receive the interrupt in that moment, not some 
>> seconds later when the next intercept occurs. So we still 
>> want to kick the vcpu thread on dec interrupts. And then we 
>> don't need anything to be atomic.
>> 
> 
> We don't only set TSR for dec interrupt case.
> In future, we would send out watchdog stuff which also set/clear TSR in non-vcpu context.
> The watchdog bit of TSR is not only related to watchdog interrupt but maintains a simple statemachine.
> Not set those states may confuse guest OS.

Yes, but why can't we do this in the vcpu thread's context so we only ever have a single instance accessing the vcpu struct? It makes a lot of things a lot easier. And for the watchdog, we don't have to set the bit unless it's clean, right? So we don't add any overhead to the vcpu unless it's actually cleaning the watchdog TSR bit.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM Development]     [KVM ARM]     [KVM ia64]     [Linux Virtualization]     [Linux USB Devel]     [Linux Video]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux