On Fri, Jan 22, 2021, Alex Shi wrote: > Hi All, > > I am newbie on KVM side, so probably I am wrong on the following. > Please correct me if it is. > > There are 3 preempted variables in kvm: > 1, kvm_vcpu.preempted in include/linux/kvm_host.h > 2, kvm_steal_time.preempted > 3, kvm_vcpu_arch.st.preempted in arch/x86 > Seems all of them are set or cleared at the same time. Like, Not quite. kvm_vcpu.preempted is set only in kvm_sched_out(), i.e. when the vCPU was running and preempted by the host scheduler. This is used by KVM when KVM detects that a guest task appears to be waiting on a lock, in which case KVM will bump the priority of preempted guest kernel threads in the hope that scheduling in the preempted vCPU will release the lock. kvm_steal_time.preempted is a paravirt struct that is shared with the guest. It is set on any call to kvm_arch_vcpu_put(), which covers kvm_sched_out() and adds the case where the vCPU exits to userspace, e.g. for IO. KVM itself hasn't been preempted, but from the guest's perspective the CPU has been "preempted" in the sense that CPU (from the guest's perspective) is not executing guest code. Similar to KVM's vCPU scheduling heuristics, the guest kernel uses this info to inform its scheduling, e.g. to avoid waiting on a lock owner to drop the lock since the lock owner is not actively running. kvm_vcpu_arch.st.preempted is effectively a host-side cache of kvm_steal_time.preempted that's used to optimize kvm_arch_vcpu_put() by avoiding the moderately costly mapping of guest. It could be dropped, but it's a single byte per vCPU so worth keeping even if the performance benefits are modest. > vcpu_put: > kvm_sched_out()-> set 3 preempted > kvm_arch_vcpu_put(): > kvm_steal_time_set_preempted > > vcpu_load: > kvm_sched_in() : clear above 3 preempted > kvm_arch_vcpu_load() -> kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu); > request dealed in vcpu_enter_guest() -> record_steal_time > > Except the 2nd one reuse with KVM_FEATURE_PV_TLB_FLUSH bit which could be used > separately, Could we combine them into one, like just bool kvm_vcpu.preempted? and > move out the KVM_FEATURE_PV_TLB_FLUSH. Believe all arch need this for a vcpu overcommit. Moving KVM_VCPU_FLUSH_TLB out of kvm_steal_time.preempted isn't viable. The guest kernel is only allowed to rely on the host to flush the vCPU's TLB if it knows the vCPU is preempted (from its perspective), as that's the only way it can guarantee that KVM will observe the TLB flush request before enterring the vCPU. KVM_VCPU_FLUSH_TLB and KVM_VCPU_PREEMPTED need to be in the same word so KVM can read and clear them atomically, otherwise there would be a window where KVM would miss the flush request.