On 18/07/19 11:29, Wanpeng Li wrote: > On Thu, 18 Jul 2019 at 17:07, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: >> >> On 18/07/19 10:43, Wanpeng Li wrote: >>>>> Isnt that done by the sched_in handler? >>>> >>>> I am a bit confused because, if it is done by the sched_in later, I >>>> don't understand why the sched_out handler hasn't set vcpu->preempted >>>> already. >>>> >>>> The s390 commit message is not very clear, but it talks about "a former >>>> sleeping cpu" that "gave up the cpu voluntarily". Does "voluntarily" >>>> that mean it is in kvm_vcpu_block? But then at least for x86 it would >>> >>> see the prepare_to_swait_exlusive() in kvm_vcpu_block(), the task will >>> be set in TASK_INTERRUPTIBLE state, kvm_sched_out will set >>> vcpu->preempted to true iff current->state == TASK_RUNNING. >> >> Ok, I was totally blind to that "if" around vcpu->preempted = true, it's >> obvious now. >> >> I think we need two flags then, for example vcpu->preempted and vcpu->ready: >> >> - kvm_sched_out sets both of them to true iff current->state == TASK_RUNNING >> >> - kvm_vcpu_kick sets vcpu->ready to true >> >> - kvm_sched_in clears both of them ... and also kvm_vcpu_on_spin should check vcpu->ready. vcpu->preempted remains only for use by vmx_vcpu_pi_put. Later we could think of removing vcpu->preempted. For example, kvm_arch_sched_out and kvm_x86_ops->sched_out could get the code currently in vmx_vcpu_pi_put (testing curent->state == TASK_RUNNING instead of vcpu->preempted). But for now there's no need and I'm not sure it's an improvement at all. Paolo >> This way, vmx_vcpu_pi_load can keep looking at preempted only (it >> handles voluntary preemption in pi_pre_block/pi_post_block).