Re: [RFC PATCH v3 2/2] KVM: s390: Extend the USER_SIGP capability

Janosch Frank <frankja@xxxxxxxxxxxxx> · Fri, 12 Nov 2021 09:49:17 +0100

On 11/11/21 18:48, Eric Farman wrote:
On Thu, 2021-11-11 at 17:13 +0100, Janosch Frank wrote:
On 11/11/21 16:03, Eric Farman wrote:
On Thu, 2021-11-11 at 10:15 +0100, David Hildenbrand wrote:
On 10.11.21 21:33, Eric Farman wrote:
With commit 2444b352c3ac ("KVM: s390: forward most SIGP orders
to
user
space") we have a capability that allows the "fast" SIGP orders
(as
defined by the Programming Notes for the SIGNAL PROCESSOR
instruction in
the Principles of Operation) to be handled in-kernel, while all
others are
sent to userspace for processing.

This works fine but it creates a situation when, for example, a
SIGP SENSE
might return CC1 (STATUS STORED, and status bits indicating the
vcpu is
stopped), when in actuality userspace is still processing a
SIGP
STOP AND
STORE STATUS order, and the vcpu is not yet actually stopped.
Thus,
the
SIGP SENSE should actually be returning CC2 (busy) instead of
CC1.

To fix this, add another CPU capability, dependent on the
USER_SIGP
one,
and two associated IOCTLs. One IOCTL will be used by userspace
to
mark a
vcpu "busy" processing a SIGP order, and cause concurrent
orders
handled
in-kernel to be returned with CC2 (busy). Another IOCTL will be
used by
userspace to mark the SIGP "finished", and the vcpu free to
process
additional orders.


This looks much cleaner to me, thanks!

[...]

diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-
s390.h
index c07a050d757d..54371cede485 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -82,6 +82,22 @@ static inline int is_vcpu_idle(struct
kvm_vcpu
*vcpu)
   	return test_bit(vcpu->vcpu_idx, vcpu->kvm-
arch.idle_mask);
   }
   
+static inline bool kvm_s390_vcpu_is_sigp_busy(struct kvm_vcpu
*vcpu)
+{
+	return (atomic_read(&vcpu->arch.sigp_busy) == 1);

You can drop ()

+}
+
+static inline bool kvm_s390_vcpu_set_sigp_busy(struct kvm_vcpu
*vcpu)
+{
+	/* Return zero for success, or -EBUSY if another vcpu
won */
+	return (atomic_cmpxchg(&vcpu->arch.sigp_busy, 0, 1) ==
0) ? 0 :
-EBUSY;

You can drop () as well.

We might not need the -EBUSY semantics after all. User space can
just
track if it was set, because it's in charge of setting it.

Hrm, I added this to distinguish a newer kernel with an older QEMU,
but
of course an older QEMU won't know the difference either. I'll
doublecheck that this is works fine in the different permutations.

+}
+
+static inline void kvm_s390_vcpu_clear_sigp_busy(struct
kvm_vcpu
*vcpu)
+{
+	atomic_set(&vcpu->arch.sigp_busy, 0);
+}
+
   static inline int kvm_is_ucontrol(struct kvm *kvm)
   {
   #ifdef CONFIG_KVM_S390_UCONTROL
diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index 5ad3fb4619f1..a37496ea6dfa 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -276,6 +276,10 @@ static int handle_sigp_dst(struct kvm_vcpu
*vcpu, u8 order_code,
   	if (!dst_vcpu)
   		return SIGP_CC_NOT_OPERATIONAL;
   
+	if (kvm_s390_vcpu_is_sigp_busy(dst_vcpu)) {
+		return SIGP_CC_BUSY;
+	}

You can drop {}

Arg, I had some debug in there which needed the braces, and of
course
it's unnecessary now. Thanks.

+
   	switch (order_code) {
   	case SIGP_SENSE:
   		vcpu->stat.instruction_sigp_sense++;
@@ -411,6 +415,12 @@ int kvm_s390_handle_sigp(struct kvm_vcpu
*vcpu)
   	if (handle_sigp_order_in_user_space(vcpu, order_code,
cpu_addr))
   		return -EOPNOTSUPP;
   
+	/* Check the current vcpu, if it was a target from
another vcpu
*/
+	if (kvm_s390_vcpu_is_sigp_busy(vcpu)) {
+		kvm_s390_set_psw_cc(vcpu, SIGP_CC_BUSY);
+		return 0;
+	}

I don't think we need this. I think the above (checking the
target of
a
SIGP order) is sufficient. Or which situation do you have in
mind?


Hrm... I think you're right. I was thinking of this:

VCPU 1 - SIGP STOP CPU 2
VCPU 2 - SIGP SENSE CPU 1

But of course either CPU2 is going to be marked "busy" first, and
the
sense doesn't get processed until it's reset, or the sense arrives
first, and the busy/notbusy doesn't matter. Let me doublecheck my
tests
for the non-RFC version.


I do wonder if we want to make this a kvm_arch_vcpu_ioctl()
instead,

In one of my original attempts between v1 and v2, I had put this
there.
This reliably deadlocks my guest, because the caller
(kvm_vcpu_ioctl())
tries to acquire vcpu->mutex, and racing SIGPs (via KVM_RUN) might
already be holding it. Thus, it's an async ioctl. I could fold it
into
the existing interrupt ioctl, but as those are architected structs
it
seems more natural do it this way. Or I have mis-understood
something
along the way?

essentially just providing a KVM_S390_SET_SIGP_BUSY *and*
providing
the
order. "order == 0" sets it to !busy.

I'd tried this too, since it provided some nice debug-ability.
Unfortunately, I have a testcase (which I'll eventually get folded
into
kvm-unit-tests :)) that picks a random order between 0-255, knowing
that there's only a couple handfuls of valid orders, to check the
response. Zero is valid architecturally (POPS figure 4-29), even if
it's unassigned. The likelihood of it becoming assigned is probably
quite low, but I'm not sure that I like special-casing an order of
zero
in this way.


Looking at the API I'd like to avoid having two IOCTLs

Since the order is a single byte, we could have the payload of an ioctl
say "0-255 is an order that we're busy processing, anything higher than
that resets the busy" or something. That would remove the need for a
second IOCTL.

and I'd love to
see some way to extend this without the need for a whole new IOCTL.


Do you mean zero IOCTLs? Because... I think the only way we can do that
is to get rid of USER_SIGP altogether, and handle everything in-kernel.
Some weeks ago I played with QEMU not enabling USER_SIGP, but I can't
say I've tried it since we went down this "mark a vcpu busy" path. If I
do that busy/not-busy tagging in the kernel for !USER_SIGP, that might
not be a bad thing anyway. But I don't know how we get the behavior
straightened out for USER_SIGP without some type of handshake.

I'd move over to a very small struct argument with a command and a flags 
field so we can extend the IOCTL at a later time without the need to 
introduce a new IOCTL.

IMHO there's no real need to make this IOCTL as small as possible and 
only handle an int as the argument with < 0 shenanigans. We should 
rather focus on making this a nice and sane API if we have the option to 
do so.




Not that we would need the value
right now, but who knows for what we might reuse that interface
in
the
future.

Thanks!