Hi Sean, Thanks a lot for the detailed explanations. Yes, they are all meaningful variables on x86. But for more archs, guess the lock issue on different arch's guest are similar. Maybe a abstraction on the point would be very helpful. Any comments? Thanks Alex 在 2021/1/23 上午3:30, Sean Christopherson 写道: > On Fri, Jan 22, 2021, Alex Shi wrote: >> Hi All, >> >> I am newbie on KVM side, so probably I am wrong on the following. >> Please correct me if it is. >> >> There are 3 preempted variables in kvm: >> 1, kvm_vcpu.preempted in include/linux/kvm_host.h >> 2, kvm_steal_time.preempted >> 3, kvm_vcpu_arch.st.preempted in arch/x86 >> Seems all of them are set or cleared at the same time. Like, > > Not quite. kvm_vcpu.preempted is set only in kvm_sched_out(), i.e. when the > vCPU was running and preempted by the host scheduler. This is used by KVM when > KVM detects that a guest task appears to be waiting on a lock, in which case KVM > will bump the priority of preempted guest kernel threads in the hope that > scheduling in the preempted vCPU will release the lock. > > kvm_steal_time.preempted is a paravirt struct that is shared with the guest. It > is set on any call to kvm_arch_vcpu_put(), which covers kvm_sched_out() and adds > the case where the vCPU exits to userspace, e.g. for IO. KVM itself hasn't been > preempted, but from the guest's perspective the CPU has been "preempted" in the > sense that CPU (from the guest's perspective) is not executing guest code. > Similar to KVM's vCPU scheduling heuristics, the guest kernel uses this info to > inform its scheduling, e.g. to avoid waiting on a lock owner to drop the lock > since the lock owner is not actively running. > > kvm_vcpu_arch.st.preempted is effectively a host-side cache of > kvm_steal_time.preempted that's used to optimize kvm_arch_vcpu_put() by avoiding > the moderately costly mapping of guest. It could be dropped, but it's a single > byte per vCPU so worth keeping even if the performance benefits are modest. > >> vcpu_put: >> kvm_sched_out()-> set 3 preempted >> kvm_arch_vcpu_put(): >> kvm_steal_time_set_preempted >> >> vcpu_load: >> kvm_sched_in() : clear above 3 preempted >> kvm_arch_vcpu_load() -> kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu); >> request dealed in vcpu_enter_guest() -> record_steal_time >> >> Except the 2nd one reuse with KVM_FEATURE_PV_TLB_FLUSH bit which could be used >> separately, Could we combine them into one, like just bool kvm_vcpu.preempted? and >> move out the KVM_FEATURE_PV_TLB_FLUSH. Believe all arch need this for a vcpu overcommit. > > Moving KVM_VCPU_FLUSH_TLB out of kvm_steal_time.preempted isn't viable. The > guest kernel is only allowed to rely on the host to flush the vCPU's TLB if it > knows the vCPU is preempted (from its perspective), as that's the only way it > can guarantee that KVM will observe the TLB flush request before enterring the > vCPU. KVM_VCPU_FLUSH_TLB and KVM_VCPU_PREEMPTED need to be in the same word so > KVM can read and clear them atomically, otherwise there would be a window where > KVM would miss the flush request. >