Re: [RFC PATCH v3 2/2] KVM: s390: Extend the USER_SIGP capability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19.11.21 21:20, Eric Farman wrote:
> On Wed, 2021-11-17 at 08:54 +0100, Christian Borntraeger wrote:
>> Am 11.11.21 um 20:05 schrieb Eric Farman:
>>> On Thu, 2021-11-11 at 19:29 +0100, David Hildenbrand wrote:
>>>> On 11.11.21 18:48, Eric Farman wrote:
>>>>> On Thu, 2021-11-11 at 17:13 +0100, Janosch Frank wrote:
>>>>>>
>>>>>> Looking at the API I'd like to avoid having two IOCTLs
>>>>>
>>>>> Since the order is a single byte, we could have the payload of
>>>>> an
>>>>> ioctl
>>>>> say "0-255 is an order that we're busy processing, anything
>>>>> higher
>>>>> than
>>>>> that resets the busy" or something. That would remove the need
>>>>> for
>>>>> a
>>>>> second IOCTL.
>>>>
>>>> Maybe just pass an int and treat a negative (or just -1) value as
>>>> clearing the order.
>>>>
>>>
>>> Right, that's exactly what I had at one point. I thought it was too
>>> cumbersome, but maybe not. Will dust it off, pending my question to
>>> Janosch about 0-vs-1 IOCTLs.
>>
>> As a totally different idea. Would a sync_reg value called SIGP_BUSY
>> work as well?
>>
> 
> Hrm... I'm not sure. I played with it a bit, and it's not looking
> great. I'm almost certainly missing some serialization, because I was
> frequently "losing" one of the toggles (busy/not-busy) when hammering
> CPUs with various SIGP orders on this interface and thus getting
> incorrect responses from the in-kernel orders.

You can only modify the destination CPU from the destination CPU thread,
after synchronizing the CPU state. This would be trivial with my QEMU proposal.

> 
> I also took a stab at David's idea of tying it to KVM_MP_STATE [1]. I
> still think it's a little odd, considering the existing states are all
> z/Arch-defined CPU states, but it does sound like the sort of thing
> we're trying to do (letting userspace announce what the CPU is up to).
> One flaw is that most of the rest of QEMU uses s390_cpu_set_state() for
> this, which returns the number of running CPUs instead of the return
> code from the MP_STATE ioctl (via kvm_s390_set_cpu_state()) that SIGP
> would be interested in. Even if I made the ioctl call directly, I still
> encounter some system problems that smell like ones I've addressed in
> v2 and v3. Possibly fixable, but I didn't pursue them far enough to be
> certain.

Well, we can essentially observe this special state of that CPU ("stopping"), so
it's not that weird. STOPPING is essentially OPERATING with the notion of
"the CPU is blocked for some actions.".

> 
> I ALSO took a stab at folding this into the S390 IRQ paths [2], similar
> to what was done with kvm_s390_stop_info. This worked reasonably well,
> except the QEMU interface kvm_s390_vcpu_interrupt() returns a void, and
> so wouldn't notice an error sent back by KVM. Not a deal breaker, but
> having not heard anything to this idea, I didn't go much farther.

We only care about SIGP STOP* handling so far, if anybody is aware of other issues
that need fixing, it would be helpful  to spell them out. I'll keep assuming that
only SIGP STOP*  needs fixing, as I already explained.

Whenever QEMU tells a CPU to stop asynchronously, it does so via a STOP IRQ from
the destination CPU thread via KVM_S390_SIGP_STOP. Then, we know the CPU is busy
... until we clear that interrupt, which happens via kvm_s390_clear_stop_irq().

Interestingly, we clear that interrupt via two paths:

1) kvm_s390_clear_local_irqs(), via KVM_S390_INITIAL_RESET and 
   KVM_S390_NORMAL_RESET. Here, we expect that new user space also sets  
   the CPU to stopped via KVM_MP_STATE_STOPPED. In fact, modern QEMU 
   properly sets the CPU stopped before triggering clearing of the 
   interrupts (s390_cpu_reset()).
2) kvm_s390_clear_stop_irq() via kvm_s390_vcpu_stop(), which is 
   triggered via:

a) STOP intercept (handle_stop()), KVM_S390_INITIAL_RESET and 
   KVM_S390_NORMAL_RESET with old user space -- 
   !kvm_s390_user_cpu_state_ctrl().

b) KVM_MP_STATE_STOPPED for modern user space.



Would the following solve our SIGP STOP * issue w.o. uapi changes?


a) Kernel

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index c6257f625929..bd7ee1ea8aa8 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4643,10 +4643,15 @@ int kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu)
                }
        }
 
-       /* SIGP STOP and SIGP STOP AND STORE STATUS has been fully processed */
+       /*
+        * SIGP STOP and SIGP STOP AND STORE STATUS have been fully
+        * processed. Clear the interrupt after setting the VCPU stopped,
+        * such that the VCPU remains busy for most SIGP orders until fully
+        * stopped.
+        */
+       kvm_s390_set_cpuflags(vcpu, CPUSTAT_STOPPED);
        kvm_s390_clear_stop_irq(vcpu);
 
-       kvm_s390_set_cpuflags(vcpu, CPUSTAT_STOPPED);
        __disable_ibs_on_vcpu(vcpu);
 
        for (i = 0; i < online_vcpus; i++) {
diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index cf4de80bd541..e40f6412106d 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -276,6 +276,38 @@ static int handle_sigp_dst(struct kvm_vcpu *vcpu, u8 order_code,
        if (!dst_vcpu)
                return SIGP_CC_NOT_OPERATIONAL;
 
+       /*
+        * SIGP STOP * orders are the only SIGP orders that are processed
+        * asynchronously, and can theoretically, never complete.
+        *
+        * Until the destination VCPU is stopped via kvm_s390_vcpu_stop(), we
+        * have a stop interrupt pending. While we have a pending stop
+        * interrupt, that CPU is busy for most SIGP orders.
+        *
+        * This is important, because otherwise a single VCPU could issue on an
+        * operating destination VCPU:
+        * 1) SIGP STOP $DEST
+        * 2) SIGP SENSE $DEST
+        * And have 2) not rejected with BUSY although the destination is still
+        * processing the pending SIGP STOP * order.
+        *
+        * Relevant code has to make sure to complete the SIGP STOP * action
+        * (e.g., setting the CPU stopped, storing the status) before clearing
+        * the STOP interrupt.
+        */
+       if (order_code != SIGP_INITIAL_CPU_RESET &&
+           order_code != SIGP_CPU_RESET) {
+               /*
+                * Lockless check. SIGP STOP / SIGP RE(START) properly
+                * synchronizes when processing these orders. In any other case,
+                * we don't particularly care about races, as the guest cannot
+                * observe the difference really when issuing orders from two
+                * differing VCPUs.
+                */
+               if (kvm_s390_is_stop_irq_pending(dst_vcpu))
+                       return SIGP_CC_BUSY;
+       }
+
        switch (order_code) {
        case SIGP_SENSE:
                vcpu->stat.instruction_sigp_sense++;

b) QEMU

diff --git a/target/s390x/sigp.c b/target/s390x/sigp.c
index 51c727834c..e97e3a60fd 100644
--- a/target/s390x/sigp.c
+++ b/target/s390x/sigp.c
@@ -479,13 +479,17 @@ void do_stop_interrupt(CPUS390XState *env)
 {
     S390CPU *cpu = env_archcpu(env);
 
-    if (s390_cpu_set_state(S390_CPU_STATE_STOPPED, cpu) == 0) {
-        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
-    }
+    /*
+     * Complete the STOP operation before exposing the CPU as STOPPED to
+     * the system.
+     */
     if (cpu->env.sigp_order == SIGP_STOP_STORE_STATUS) {
         s390_store_status(cpu, S390_STORE_STATUS_DEF_ADDR, true);
     }
     env->sigp_order = 0;
+    if (s390_cpu_set_state(S390_CPU_STATE_STOPPED, cpu) == 0) {
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    }
     env->pending_int &= ~INTERRUPT_STOP;
 }
 


-- 
Thanks,

David / dhildenb




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux