Re: [PATCH] KVM: add kvm_arch_cpu_kick

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Tue, 21 Feb 2017 18:15:18 +0100

2017-02-21 09:59+0100, Christian Borntraeger:
> On 02/20/2017 10:45 PM, Radim Krčmář wrote:
>> 2017-02-20 12:35+0100, David Hildenbrand:
>>> Am 20.02.2017 um 12:12 schrieb Christian Borntraeger:
>>>> On 02/17/2017 06:10 PM, David Hildenbrand wrote:
>>>>>
>>>>>>> Yes, it would.  There's some parallel with QEMU's qemu_cpu_kick, where
>>>>>>> the signal would be processed immediately after entering KVM_RUN.
>>>>>>
>>>>>> Something like 
>>>>>>
>>>>>> ---snip-----
>>>>>>         struct kvm_s390_sie_block *scb = READ_ONCE(vcpu->arch.vsie_block);
>>>>>>
>>>>>> 	atomic_or(CPUSTAT_STOP_INT, &vcpu->arch.sie_block->cpuflags);
>>>>>>         if (scb)
>>>>>> 		atomic_or(CPUSTAT_STOP_INT, &scb->cpuflags);
>>>>>> ---snip-----
>>>>>>
>>>>>> or 
>>>>>> ---snip-----
>>>>>> 	atomic_or(CPUSTAT_STOP_INT, &vcpu->arch.sie_block->cpuflags);
>>>>>> 	kvm_s390_vsie_kick(vcpu);
>>>>>> ---snip-----
>>>>>
>>>>> I'd go for the latter one. Keep the vsie stuff isolated. Please note
>>>>
>>>> Yes makes sense.
>>>>
>>>> Radim, if you go with this patch something like this can be used as the
>>>> s390 variant of kvm_arch_cpu_kick:
>>>>
>>>> ---snip---
>>>> 	/*
>>>> 	 * The stop indication is reset in the interrupt code. As the CPU
>>>> 	 * loop handles requests after interrupts, we will
>>>> 	 * a: miss the request handler and enter the guest, but then the
>>>> 	 * stop request will exit the CPU and handle the request in the next
>>>> 	 * round or
>>>> 	 * b: handle the request directly before entering the guest
>>>> 	 */
>>>> 	atomic_or(CPUSTAT_STOP_INT, &vcpu->arch.sie_block->cpuflags);
>>>> 	kvm_s390_vsie_kick(vcpu);
>>>>
>>>> ---snip---
>>>> feel free to add that to your patch. I can also send a fixup patch later
>>>> on if you prefer that.
>>>
>>> kvm_arch_vcpu_should_kick() then also has to be changed to return 1 for now.
>>>
>>> An interesting thing to note is how vcpu->cpu is used.
>>>
>>> Again, as s390x can preempt just before entering the guest, vcpu_kick()
>>> might see vcpu->cpu = -1. Therefore, kvm_arch_vcpu_should_kick() won't
>>> even be called. So our cpu might go into guest mode and stay there
>>> longer than expected (as we won't kick it).
>>>
>>> On x86, it is the following way:
>>>
>>> If vcpu->cpu is -1, no need to kick the VCPU. It will check for requests
>>> when preemption is disabled, therefore when rescheduled.
>>>
>>> If vcpu->cpu is set, kvm_arch_vcpu_should_kick() remembers if the VCPU
>>> has already been kicked while in the critical section. It will get
>>> kicked by smp resched as soon as entering guest mode.
>>>
>>> So here, disabled preemption + checks in the section with disabled
>>> preemption (for requests and EXITING_GUEST_MODE) make sure that the
>>> guest will leave guest mode and process requests in a timely fashion.
>>>
>>> On s390x, this is not 100% true. vcpu->cpu cannot be used as an
>>> indicator whether a kick is necessary. Either that is ok for now, or the
>>> vcpu->cpu != -1 check has to be disabled for s390x, e.g. by moving the
>>> check into kvm_arch_vcpu_should_kick().
>> 
>> Good point.
>> 
>> So s390 doesn't need vcpu->cpu and only sets it because other arches do?
> 
> David added it as a sanity check for cpu time accounting while in host. But
> we do not need it, yes.
> 
>> And do I understand it correctly that the s390 SIE block operations have
>> no side-effect, apart from changed memory, when outside of guest-mode?
> 
> You mean accesses to the sie control block vcpu->arch.sie_block? Yes its just
> changed memory as long as the VCPU that is backed by this sie control block
> is not running.

Great, thanks.

>> (We have cpu->mode mostly because interrupts are expensive. :])
>> 
>> In the end, I'd like to use kvm_vcpu_kick() for kvm_s390_vcpu_wakeup().
> 
> something like this?
> 
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -1067,15 +1067,7 @@ void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu)
>          * in kvm_vcpu_block without having the waitqueue set (polling)
>          */
>         vcpu->valid_wakeup = true;
> -       if (swait_active(&vcpu->wq)) {
> -               /*
> -                * The vcpu gave up the cpu voluntarily, mark it as a good
> -                * yield-candidate.
> -                */
> -               vcpu->preempted = true;
> -               swake_up(&vcpu->wq);
> -               vcpu->stat.halt_wakeup++;
> -       }
> +       kvm_vcpu_wake_up(vcpu);
>         /*
>          * The VCPU might not be sleeping but is executing the VSIE. Let's
>          * kick it, so it leaves the SIE to process the request.

Yes, and ideally also covering the SIE kick, so the result would be

  {
  	vcpu->valid_wakeup = true;
 +	kvm_vcpu_kick(vcpu);
  }

> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2213,6 +2213,7 @@ void kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
>         wqp = kvm_arch_vcpu_wq(vcpu);
>         if (swait_active(wqp)) {
>                 swake_up(wqp);
> +               vcpu->preempted = true;
> 
>                 ++vcpu->stat.halt_wakeup;
>         }
> 
>> s390 sets vcpu->preempted to get a performance boost, which makes
>> touching it less than desirable ...
>> On s390, vcpu->preempted is only used in __diag_time_slice_end(), which
>> seems to be a type of spinning-on-a-taken-lock hypercall -- any reason
>> why that optimization shouldn't work on other architectures?
> 
> We set preempted in kvm_s390_vcpu_wakeup because otherwise a cpu that sleeps
> (halted) would not be considered as a good candidate in kvm_vcpu_on_spin,
> even if we just decided to wakeup that CPU for an interrupt.
> 
> Yes, it certainly makes sense to do that in kvm_vcpu_wake_up as well.

I assume that s390 doesn't go to sleep while holding a spinlock, so it
would mean that we need to update kvm_vcpu_on_spin().
(Ignoring a formerly sleeping VCPU as a candidate made sense: the VCPU
 shouldn't have been holding a lock that is blocking the spinning VCPU.)

Boosting a VCPU that is likely not going to use the contended spinlock
seems like a good thing to try.  I think we don't need vcpu->preempted
in its current form then.  After the change, it it would be false only
in two cases:
 1) the VCPU task is currently scheduled on some CPU
 2) the VCPU is sleeping

There might be some other way know (1) (or we can adapt vcpu->preempted
to vcpu->scheduled) and (2) is done with swait_active().

Sadly, any change of kvm_vcpu_on_spin() is going to need many days of
testing ...