Re: 3 preempted variables in kvm

Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx> · Mon, 25 Jan 2021 13:49:04 +0800

Hi Sean,

Thanks a lot for the detailed explanations.
Yes, they are all meaningful variables on x86. But for more archs, guess the lock
issue on different arch's guest are similar. Maybe a abstraction on the point would
be very helpful. Any comments?

Thanks
Alex

在 2021/1/23 上午3:30, Sean Christopherson 写道:
> On Fri, Jan 22, 2021, Alex Shi wrote:
>> Hi All,
>>
>> I am newbie on KVM side, so probably I am wrong on the following.
>> Please correct me if it is.
>>
>> There are 3 preempted variables in kvm:
>>      1, kvm_vcpu.preempted  in include/linux/kvm_host.h
>>      2, kvm_steal_time.preempted
>>      3, kvm_vcpu_arch.st.preempted in arch/x86
>> Seems all of them are set or cleared at the same time. Like,
> 
> Not quite.  kvm_vcpu.preempted is set only in kvm_sched_out(), i.e. when the
> vCPU was running and preempted by the host scheduler.  This is used by KVM when
> KVM detects that a guest task appears to be waiting on a lock, in which case KVM
> will bump the priority of preempted guest kernel threads in the hope that
> scheduling in the preempted vCPU will release the lock.
> 
> kvm_steal_time.preempted is a paravirt struct that is shared with the guest.  It
> is set on any call to kvm_arch_vcpu_put(), which covers kvm_sched_out() and adds
> the case where the vCPU exits to userspace, e.g. for IO.  KVM itself hasn't been
> preempted, but from the guest's perspective the CPU has been "preempted" in the
> sense that CPU (from the guest's perspective) is not executing guest code.
> Similar to KVM's vCPU scheduling heuristics, the guest kernel uses this info to
> inform its scheduling, e.g. to avoid waiting on a lock owner to drop the lock
> since the lock owner is not actively running.
> 
> kvm_vcpu_arch.st.preempted is effectively a host-side cache of
> kvm_steal_time.preempted that's used to optimize kvm_arch_vcpu_put() by avoiding
> the moderately costly mapping of guest.  It could be dropped, but it's a single
> byte per vCPU so worth keeping even if the performance benefits are modest.
> 
>> vcpu_put:
>>         kvm_sched_out()-> set 3 preempted
>>                 kvm_arch_vcpu_put():
>>                         kvm_steal_time_set_preempted
>>
>> vcpu_load:
>>         kvm_sched_in() : clear above 3 preempted
>>                 kvm_arch_vcpu_load() -> kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
>>                 request dealed in vcpu_enter_guest() -> record_steal_time
>>
>> Except the 2nd one reuse with KVM_FEATURE_PV_TLB_FLUSH bit which could be used
>> separately, Could we combine them into one, like just bool kvm_vcpu.preempted? and 
>> move out the KVM_FEATURE_PV_TLB_FLUSH. Believe all arch need this for a vcpu overcommit.
> 
> Moving KVM_VCPU_FLUSH_TLB out of kvm_steal_time.preempted isn't viable. The
> guest kernel is only allowed to rely on the host to flush the vCPU's TLB if it
> knows the vCPU is preempted (from its perspective), as that's the only way it
> can guarantee that KVM will observe the TLB flush request before enterring the
> vCPU.  KVM_VCPU_FLUSH_TLB and KVM_VCPU_PREEMPTED need to be in the same word so
> KVM can read and clear them atomically, otherwise there would be a window where
> KVM would miss the flush request.
>