Re: [RFC PATCH v3 2/2] KVM: s390: Extend the USER_SIGP capability

David Hildenbrand <david@xxxxxxxxxx> · Thu, 11 Nov 2021 20:15:24 +0100

On 11.11.21 20:05, Eric Farman wrote:
> On Thu, 2021-11-11 at 19:29 +0100, David Hildenbrand wrote:
>> On 11.11.21 18:48, Eric Farman wrote:
>>> On Thu, 2021-11-11 at 17:13 +0100, Janosch Frank wrote:
>>>> On 11/11/21 16:03, Eric Farman wrote:
>>>>> On Thu, 2021-11-11 at 10:15 +0100, David Hildenbrand wrote:
>>>>>> On 10.11.21 21:33, Eric Farman wrote:
>>>>>>> With commit 2444b352c3ac ("KVM: s390: forward most SIGP
>>>>>>> orders
>>>>>>> to
>>>>>>> user
>>>>>>> space") we have a capability that allows the "fast" SIGP
>>>>>>> orders
>>>>>>> (as
>>>>>>> defined by the Programming Notes for the SIGNAL PROCESSOR
>>>>>>> instruction in
>>>>>>> the Principles of Operation) to be handled in-kernel, while
>>>>>>> all
>>>>>>> others are
>>>>>>> sent to userspace for processing.
>>>>>>>
>>>>>>> This works fine but it creates a situation when, for
>>>>>>> example, a
>>>>>>> SIGP SENSE
>>>>>>> might return CC1 (STATUS STORED, and status bits indicating
>>>>>>> the
>>>>>>> vcpu is
>>>>>>> stopped), when in actuality userspace is still processing a
>>>>>>> SIGP
>>>>>>> STOP AND
>>>>>>> STORE STATUS order, and the vcpu is not yet actually
>>>>>>> stopped.
>>>>>>> Thus,
>>>>>>> the
>>>>>>> SIGP SENSE should actually be returning CC2 (busy) instead
>>>>>>> of
>>>>>>> CC1.
>>>>>>>
>>>>>>> To fix this, add another CPU capability, dependent on the
>>>>>>> USER_SIGP
>>>>>>> one,
>>>>>>> and two associated IOCTLs. One IOCTL will be used by
>>>>>>> userspace
>>>>>>> to
>>>>>>> mark a
>>>>>>> vcpu "busy" processing a SIGP order, and cause concurrent
>>>>>>> orders
>>>>>>> handled
>>>>>>> in-kernel to be returned with CC2 (busy). Another IOCTL
>>>>>>> will be
>>>>>>> used by
>>>>>>> userspace to mark the SIGP "finished", and the vcpu free to
>>>>>>> process
>>>>>>> additional orders.
>>>>>>>
>>>>>>
>>>>>> This looks much cleaner to me, thanks!
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-
>>>>>>> s390.h
>>>>>>> index c07a050d757d..54371cede485 100644
>>>>>>> --- a/arch/s390/kvm/kvm-s390.h
>>>>>>> +++ b/arch/s390/kvm/kvm-s390.h
>>>>>>> @@ -82,6 +82,22 @@ static inline int is_vcpu_idle(struct
>>>>>>> kvm_vcpu
>>>>>>> *vcpu)
>>>>>>>   	return test_bit(vcpu->vcpu_idx, vcpu->kvm-
>>>>>>>> arch.idle_mask);
>>>>>>>   }
>>>>>>>   
>>>>>>> +static inline bool kvm_s390_vcpu_is_sigp_busy(struct
>>>>>>> kvm_vcpu
>>>>>>> *vcpu)
>>>>>>> +{
>>>>>>> +	return (atomic_read(&vcpu->arch.sigp_busy) == 1);
>>>>>>
>>>>>> You can drop ()
>>>>>>
>>>>>>> +}
>>>>>>> +
>>>>>>> +static inline bool kvm_s390_vcpu_set_sigp_busy(struct
>>>>>>> kvm_vcpu
>>>>>>> *vcpu)
>>>>>>> +{
>>>>>>> +	/* Return zero for success, or -EBUSY if another vcpu
>>>>>>> won */
>>>>>>> +	return (atomic_cmpxchg(&vcpu->arch.sigp_busy, 0, 1) ==
>>>>>>> 0) ? 0 :
>>>>>>> -EBUSY;
>>>>>>
>>>>>> You can drop () as well.
>>>>>>
>>>>>> We might not need the -EBUSY semantics after all. User space
>>>>>> can
>>>>>> just
>>>>>> track if it was set, because it's in charge of setting it.
>>>>>
>>>>> Hrm, I added this to distinguish a newer kernel with an older
>>>>> QEMU,
>>>>> but
>>>>> of course an older QEMU won't know the difference either. I'll
>>>>> doublecheck that this is works fine in the different
>>>>> permutations.
>>>>>
>>>>>>> +}
>>>>>>> +
>>>>>>> +static inline void kvm_s390_vcpu_clear_sigp_busy(struct
>>>>>>> kvm_vcpu
>>>>>>> *vcpu)
>>>>>>> +{
>>>>>>> +	atomic_set(&vcpu->arch.sigp_busy, 0);
>>>>>>> +}
>>>>>>> +
>>>>>>>   static inline int kvm_is_ucontrol(struct kvm *kvm)
>>>>>>>   {
>>>>>>>   #ifdef CONFIG_KVM_S390_UCONTROL
>>>>>>> diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
>>>>>>> index 5ad3fb4619f1..a37496ea6dfa 100644
>>>>>>> --- a/arch/s390/kvm/sigp.c
>>>>>>> +++ b/arch/s390/kvm/sigp.c
>>>>>>> @@ -276,6 +276,10 @@ static int handle_sigp_dst(struct
>>>>>>> kvm_vcpu
>>>>>>> *vcpu, u8 order_code,
>>>>>>>   	if (!dst_vcpu)
>>>>>>>   		return SIGP_CC_NOT_OPERATIONAL;
>>>>>>>   
>>>>>>> +	if (kvm_s390_vcpu_is_sigp_busy(dst_vcpu)) {
>>>>>>> +		return SIGP_CC_BUSY;
>>>>>>> +	}
>>>>>>
>>>>>> You can drop {}
>>>>>
>>>>> Arg, I had some debug in there which needed the braces, and of
>>>>> course
>>>>> it's unnecessary now. Thanks.
>>>>>
>>>>>>> +
>>>>>>>   	switch (order_code) {
>>>>>>>   	case SIGP_SENSE:
>>>>>>>   		vcpu->stat.instruction_sigp_sense++;
>>>>>>> @@ -411,6 +415,12 @@ int kvm_s390_handle_sigp(struct
>>>>>>> kvm_vcpu
>>>>>>> *vcpu)
>>>>>>>   	if (handle_sigp_order_in_user_space(vcpu, order_code,
>>>>>>> cpu_addr))
>>>>>>>   		return -EOPNOTSUPP;
>>>>>>>   
>>>>>>> +	/* Check the current vcpu, if it was a target from
>>>>>>> another vcpu
>>>>>>> */
>>>>>>> +	if (kvm_s390_vcpu_is_sigp_busy(vcpu)) {
>>>>>>> +		kvm_s390_set_psw_cc(vcpu, SIGP_CC_BUSY);
>>>>>>> +		return 0;
>>>>>>> +	}
>>>>>>
>>>>>> I don't think we need this. I think the above (checking the
>>>>>> target of
>>>>>> a
>>>>>> SIGP order) is sufficient. Or which situation do you have in
>>>>>> mind?
>>>>>>
>>>>>
>>>>> Hrm... I think you're right. I was thinking of this:
>>>>>
>>>>> VCPU 1 - SIGP STOP CPU 2
>>>>> VCPU 2 - SIGP SENSE CPU 1
>>>>>
>>>>> But of course either CPU2 is going to be marked "busy" first,
>>>>> and
>>>>> the
>>>>> sense doesn't get processed until it's reset, or the sense
>>>>> arrives
>>>>> first, and the busy/notbusy doesn't matter. Let me doublecheck
>>>>> my
>>>>> tests
>>>>> for the non-RFC version.
>>>>>
>>>>>> I do wonder if we want to make this a kvm_arch_vcpu_ioctl()
>>>>>> instead,
>>>>>
>>>>> In one of my original attempts between v1 and v2, I had put
>>>>> this
>>>>> there.
>>>>> This reliably deadlocks my guest, because the caller
>>>>> (kvm_vcpu_ioctl())
>>>>> tries to acquire vcpu->mutex, and racing SIGPs (via KVM_RUN)
>>>>> might
>>>>> already be holding it. Thus, it's an async ioctl. I could fold
>>>>> it
>>>>> into
>>>>> the existing interrupt ioctl, but as those are architected
>>>>> structs
>>>>> it
>>>>> seems more natural do it this way. Or I have mis-understood
>>>>> something
>>>>> along the way?
>>>>>
>>>>>> essentially just providing a KVM_S390_SET_SIGP_BUSY *and*
>>>>>> providing
>>>>>> the
>>>>>> order. "order == 0" sets it to !busy.
>>>>>
>>>>> I'd tried this too, since it provided some nice debug-ability.
>>>>> Unfortunately, I have a testcase (which I'll eventually get
>>>>> folded
>>>>> into
>>>>> kvm-unit-tests :)) that picks a random order between 0-255,
>>>>> knowing
>>>>> that there's only a couple handfuls of valid orders, to check
>>>>> the
>>>>> response. Zero is valid architecturally (POPS figure 4-29),
>>>>> even if
>>>>> it's unassigned. The likelihood of it becoming assigned is
>>>>> probably
>>>>> quite low, but I'm not sure that I like special-casing an order
>>>>> of
>>>>> zero
>>>>> in this way.
>>>>>
>>>>
>>>> Looking at the API I'd like to avoid having two IOCTLs 
>>>
>>> Since the order is a single byte, we could have the payload of an
>>> ioctl
>>> say "0-255 is an order that we're busy processing, anything higher
>>> than
>>> that resets the busy" or something. That would remove the need for
>>> a
>>> second IOCTL.
>>
>> Maybe just pass an int and treat a negative (or just -1) value as
>> clearing the order.
>>
> 
> Right, that's exactly what I had at one point. I thought it was too
> cumbersome, but maybe not. Will dust it off, pending my question to
> Janosch about 0-vs-1 IOCTLs.

As we really only care about the SIGP STOP case, you could theoretically
bury it into kvm_arch_vcpu_ioctl_set_mpstate(), using a new state
"KVM_MP_STATE_STOPPING".

-- 
Thanks,

David / dhildenb