Re: [RFC PATCH v3 2/2] KVM: s390: Extend the USER_SIGP capability

Eric Farman <farman@xxxxxxxxxxxxx> · Thu, 11 Nov 2021 12:48:20 -0500

On Thu, 2021-11-11 at 17:13 +0100, Janosch Frank wrote:
> On 11/11/21 16:03, Eric Farman wrote:
> > On Thu, 2021-11-11 at 10:15 +0100, David Hildenbrand wrote:
> > > On 10.11.21 21:33, Eric Farman wrote:
> > > > With commit 2444b352c3ac ("KVM: s390: forward most SIGP orders
> > > > to
> > > > user
> > > > space") we have a capability that allows the "fast" SIGP orders
> > > > (as
> > > > defined by the Programming Notes for the SIGNAL PROCESSOR
> > > > instruction in
> > > > the Principles of Operation) to be handled in-kernel, while all
> > > > others are
> > > > sent to userspace for processing.
> > > > 
> > > > This works fine but it creates a situation when, for example, a
> > > > SIGP SENSE
> > > > might return CC1 (STATUS STORED, and status bits indicating the
> > > > vcpu is
> > > > stopped), when in actuality userspace is still processing a
> > > > SIGP
> > > > STOP AND
> > > > STORE STATUS order, and the vcpu is not yet actually stopped.
> > > > Thus,
> > > > the
> > > > SIGP SENSE should actually be returning CC2 (busy) instead of
> > > > CC1.
> > > > 
> > > > To fix this, add another CPU capability, dependent on the
> > > > USER_SIGP
> > > > one,
> > > > and two associated IOCTLs. One IOCTL will be used by userspace
> > > > to
> > > > mark a
> > > > vcpu "busy" processing a SIGP order, and cause concurrent
> > > > orders
> > > > handled
> > > > in-kernel to be returned with CC2 (busy). Another IOCTL will be
> > > > used by
> > > > userspace to mark the SIGP "finished", and the vcpu free to
> > > > process
> > > > additional orders.
> > > > 
> > > 
> > > This looks much cleaner to me, thanks!
> > > 
> > > [...]
> > > 
> > > > diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-
> > > > s390.h
> > > > index c07a050d757d..54371cede485 100644
> > > > --- a/arch/s390/kvm/kvm-s390.h
> > > > +++ b/arch/s390/kvm/kvm-s390.h
> > > > @@ -82,6 +82,22 @@ static inline int is_vcpu_idle(struct
> > > > kvm_vcpu
> > > > *vcpu)
> > > >   	return test_bit(vcpu->vcpu_idx, vcpu->kvm-
> > > > >arch.idle_mask);
> > > >   }
> > > >   
> > > > +static inline bool kvm_s390_vcpu_is_sigp_busy(struct kvm_vcpu
> > > > *vcpu)
> > > > +{
> > > > +	return (atomic_read(&vcpu->arch.sigp_busy) == 1);
> > > 
> > > You can drop ()
> > > 
> > > > +}
> > > > +
> > > > +static inline bool kvm_s390_vcpu_set_sigp_busy(struct kvm_vcpu
> > > > *vcpu)
> > > > +{
> > > > +	/* Return zero for success, or -EBUSY if another vcpu
> > > > won */
> > > > +	return (atomic_cmpxchg(&vcpu->arch.sigp_busy, 0, 1) ==
> > > > 0) ? 0 :
> > > > -EBUSY;
> > > 
> > > You can drop () as well.
> > > 
> > > We might not need the -EBUSY semantics after all. User space can
> > > just
> > > track if it was set, because it's in charge of setting it.
> > 
> > Hrm, I added this to distinguish a newer kernel with an older QEMU,
> > but
> > of course an older QEMU won't know the difference either. I'll
> > doublecheck that this is works fine in the different permutations.
> > 
> > > > +}
> > > > +
> > > > +static inline void kvm_s390_vcpu_clear_sigp_busy(struct
> > > > kvm_vcpu
> > > > *vcpu)
> > > > +{
> > > > +	atomic_set(&vcpu->arch.sigp_busy, 0);
> > > > +}
> > > > +
> > > >   static inline int kvm_is_ucontrol(struct kvm *kvm)
> > > >   {
> > > >   #ifdef CONFIG_KVM_S390_UCONTROL
> > > > diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
> > > > index 5ad3fb4619f1..a37496ea6dfa 100644
> > > > --- a/arch/s390/kvm/sigp.c
> > > > +++ b/arch/s390/kvm/sigp.c
> > > > @@ -276,6 +276,10 @@ static int handle_sigp_dst(struct kvm_vcpu
> > > > *vcpu, u8 order_code,
> > > >   	if (!dst_vcpu)
> > > >   		return SIGP_CC_NOT_OPERATIONAL;
> > > >   
> > > > +	if (kvm_s390_vcpu_is_sigp_busy(dst_vcpu)) {
> > > > +		return SIGP_CC_BUSY;
> > > > +	}
> > > 
> > > You can drop {}
> > 
> > Arg, I had some debug in there which needed the braces, and of
> > course
> > it's unnecessary now. Thanks.
> > 
> > > > +
> > > >   	switch (order_code) {
> > > >   	case SIGP_SENSE:
> > > >   		vcpu->stat.instruction_sigp_sense++;
> > > > @@ -411,6 +415,12 @@ int kvm_s390_handle_sigp(struct kvm_vcpu
> > > > *vcpu)
> > > >   	if (handle_sigp_order_in_user_space(vcpu, order_code,
> > > > cpu_addr))
> > > >   		return -EOPNOTSUPP;
> > > >   
> > > > +	/* Check the current vcpu, if it was a target from
> > > > another vcpu
> > > > */
> > > > +	if (kvm_s390_vcpu_is_sigp_busy(vcpu)) {
> > > > +		kvm_s390_set_psw_cc(vcpu, SIGP_CC_BUSY);
> > > > +		return 0;
> > > > +	}
> > > 
> > > I don't think we need this. I think the above (checking the
> > > target of
> > > a
> > > SIGP order) is sufficient. Or which situation do you have in
> > > mind?
> > > 
> > 
> > Hrm... I think you're right. I was thinking of this:
> > 
> > VCPU 1 - SIGP STOP CPU 2
> > VCPU 2 - SIGP SENSE CPU 1
> > 
> > But of course either CPU2 is going to be marked "busy" first, and
> > the
> > sense doesn't get processed until it's reset, or the sense arrives
> > first, and the busy/notbusy doesn't matter. Let me doublecheck my
> > tests
> > for the non-RFC version.
> > 
> > > 
> > > I do wonder if we want to make this a kvm_arch_vcpu_ioctl()
> > > instead,
> > 
> > In one of my original attempts between v1 and v2, I had put this
> > there.
> > This reliably deadlocks my guest, because the caller
> > (kvm_vcpu_ioctl())
> > tries to acquire vcpu->mutex, and racing SIGPs (via KVM_RUN) might
> > already be holding it. Thus, it's an async ioctl. I could fold it
> > into
> > the existing interrupt ioctl, but as those are architected structs
> > it
> > seems more natural do it this way. Or I have mis-understood
> > something
> > along the way?
> > 
> > > essentially just providing a KVM_S390_SET_SIGP_BUSY *and*
> > > providing
> > > the
> > > order. "order == 0" sets it to !busy.
> > 
> > I'd tried this too, since it provided some nice debug-ability.
> > Unfortunately, I have a testcase (which I'll eventually get folded
> > into
> > kvm-unit-tests :)) that picks a random order between 0-255, knowing
> > that there's only a couple handfuls of valid orders, to check the
> > response. Zero is valid architecturally (POPS figure 4-29), even if
> > it's unassigned. The likelihood of it becoming assigned is probably
> > quite low, but I'm not sure that I like special-casing an order of
> > zero
> > in this way.
> > 
> 
> Looking at the API I'd like to avoid having two IOCTLs 

Since the order is a single byte, we could have the payload of an ioctl
say "0-255 is an order that we're busy processing, anything higher than
that resets the busy" or something. That would remove the need for a
second IOCTL.

> and I'd love to 
> see some way to extend this without the need for a whole new IOCTL.
> 

Do you mean zero IOCTLs? Because... I think the only way we can do that
is to get rid of USER_SIGP altogether, and handle everything in-kernel. 
Some weeks ago I played with QEMU not enabling USER_SIGP, but I can't
say I've tried it since we went down this "mark a vcpu busy" path. If I
do that busy/not-busy tagging in the kernel for !USER_SIGP, that might
not be a bad thing anyway. But I don't know how we get the behavior
straightened out for USER_SIGP without some type of handshake.

> 
> 
> > > Not that we would need the value
> > > right now, but who knows for what we might reuse that interface
> > > in
> > > the
> > > future.
> > > 
> > > Thanks!
> > >