On 19.11.21 21:20, Eric Farman wrote: > On Wed, 2021-11-17 at 08:54 +0100, Christian Borntraeger wrote: >> Am 11.11.21 um 20:05 schrieb Eric Farman: >>> On Thu, 2021-11-11 at 19:29 +0100, David Hildenbrand wrote: >>>> On 11.11.21 18:48, Eric Farman wrote: >>>>> On Thu, 2021-11-11 at 17:13 +0100, Janosch Frank wrote: >>>>>> >>>>>> Looking at the API I'd like to avoid having two IOCTLs >>>>> >>>>> Since the order is a single byte, we could have the payload of >>>>> an >>>>> ioctl >>>>> say "0-255 is an order that we're busy processing, anything >>>>> higher >>>>> than >>>>> that resets the busy" or something. That would remove the need >>>>> for >>>>> a >>>>> second IOCTL. >>>> >>>> Maybe just pass an int and treat a negative (or just -1) value as >>>> clearing the order. >>>> >>> >>> Right, that's exactly what I had at one point. I thought it was too >>> cumbersome, but maybe not. Will dust it off, pending my question to >>> Janosch about 0-vs-1 IOCTLs. >> >> As a totally different idea. Would a sync_reg value called SIGP_BUSY >> work as well? >> > > Hrm... I'm not sure. I played with it a bit, and it's not looking > great. I'm almost certainly missing some serialization, because I was > frequently "losing" one of the toggles (busy/not-busy) when hammering > CPUs with various SIGP orders on this interface and thus getting > incorrect responses from the in-kernel orders. You can only modify the destination CPU from the destination CPU thread, after synchronizing the CPU state. This would be trivial with my QEMU proposal. > > I also took a stab at David's idea of tying it to KVM_MP_STATE [1]. I > still think it's a little odd, considering the existing states are all > z/Arch-defined CPU states, but it does sound like the sort of thing > we're trying to do (letting userspace announce what the CPU is up to). > One flaw is that most of the rest of QEMU uses s390_cpu_set_state() for > this, which returns the number of running CPUs instead of the return > code from the MP_STATE ioctl (via kvm_s390_set_cpu_state()) that SIGP > would be interested in. Even if I made the ioctl call directly, I still > encounter some system problems that smell like ones I've addressed in > v2 and v3. Possibly fixable, but I didn't pursue them far enough to be > certain. Well, we can essentially observe this special state of that CPU ("stopping"), so it's not that weird. STOPPING is essentially OPERATING with the notion of "the CPU is blocked for some actions.". > > I ALSO took a stab at folding this into the S390 IRQ paths [2], similar > to what was done with kvm_s390_stop_info. This worked reasonably well, > except the QEMU interface kvm_s390_vcpu_interrupt() returns a void, and > so wouldn't notice an error sent back by KVM. Not a deal breaker, but > having not heard anything to this idea, I didn't go much farther. We only care about SIGP STOP* handling so far, if anybody is aware of other issues that need fixing, it would be helpful to spell them out. I'll keep assuming that only SIGP STOP* needs fixing, as I already explained. Whenever QEMU tells a CPU to stop asynchronously, it does so via a STOP IRQ from the destination CPU thread via KVM_S390_SIGP_STOP. Then, we know the CPU is busy ... until we clear that interrupt, which happens via kvm_s390_clear_stop_irq(). Interestingly, we clear that interrupt via two paths: 1) kvm_s390_clear_local_irqs(), via KVM_S390_INITIAL_RESET and KVM_S390_NORMAL_RESET. Here, we expect that new user space also sets the CPU to stopped via KVM_MP_STATE_STOPPED. In fact, modern QEMU properly sets the CPU stopped before triggering clearing of the interrupts (s390_cpu_reset()). 2) kvm_s390_clear_stop_irq() via kvm_s390_vcpu_stop(), which is triggered via: a) STOP intercept (handle_stop()), KVM_S390_INITIAL_RESET and KVM_S390_NORMAL_RESET with old user space -- !kvm_s390_user_cpu_state_ctrl(). b) KVM_MP_STATE_STOPPED for modern user space. Would the following solve our SIGP STOP * issue w.o. uapi changes? a) Kernel diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index c6257f625929..bd7ee1ea8aa8 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4643,10 +4643,15 @@ int kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu) } } - /* SIGP STOP and SIGP STOP AND STORE STATUS has been fully processed */ + /* + * SIGP STOP and SIGP STOP AND STORE STATUS have been fully + * processed. Clear the interrupt after setting the VCPU stopped, + * such that the VCPU remains busy for most SIGP orders until fully + * stopped. + */ + kvm_s390_set_cpuflags(vcpu, CPUSTAT_STOPPED); kvm_s390_clear_stop_irq(vcpu); - kvm_s390_set_cpuflags(vcpu, CPUSTAT_STOPPED); __disable_ibs_on_vcpu(vcpu); for (i = 0; i < online_vcpus; i++) { diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c index cf4de80bd541..e40f6412106d 100644 --- a/arch/s390/kvm/sigp.c +++ b/arch/s390/kvm/sigp.c @@ -276,6 +276,38 @@ static int handle_sigp_dst(struct kvm_vcpu *vcpu, u8 order_code, if (!dst_vcpu) return SIGP_CC_NOT_OPERATIONAL; + /* + * SIGP STOP * orders are the only SIGP orders that are processed + * asynchronously, and can theoretically, never complete. + * + * Until the destination VCPU is stopped via kvm_s390_vcpu_stop(), we + * have a stop interrupt pending. While we have a pending stop + * interrupt, that CPU is busy for most SIGP orders. + * + * This is important, because otherwise a single VCPU could issue on an + * operating destination VCPU: + * 1) SIGP STOP $DEST + * 2) SIGP SENSE $DEST + * And have 2) not rejected with BUSY although the destination is still + * processing the pending SIGP STOP * order. + * + * Relevant code has to make sure to complete the SIGP STOP * action + * (e.g., setting the CPU stopped, storing the status) before clearing + * the STOP interrupt. + */ + if (order_code != SIGP_INITIAL_CPU_RESET && + order_code != SIGP_CPU_RESET) { + /* + * Lockless check. SIGP STOP / SIGP RE(START) properly + * synchronizes when processing these orders. In any other case, + * we don't particularly care about races, as the guest cannot + * observe the difference really when issuing orders from two + * differing VCPUs. + */ + if (kvm_s390_is_stop_irq_pending(dst_vcpu)) + return SIGP_CC_BUSY; + } + switch (order_code) { case SIGP_SENSE: vcpu->stat.instruction_sigp_sense++; b) QEMU diff --git a/target/s390x/sigp.c b/target/s390x/sigp.c index 51c727834c..e97e3a60fd 100644 --- a/target/s390x/sigp.c +++ b/target/s390x/sigp.c @@ -479,13 +479,17 @@ void do_stop_interrupt(CPUS390XState *env) { S390CPU *cpu = env_archcpu(env); - if (s390_cpu_set_state(S390_CPU_STATE_STOPPED, cpu) == 0) { - qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN); - } + /* + * Complete the STOP operation before exposing the CPU as STOPPED to + * the system. + */ if (cpu->env.sigp_order == SIGP_STOP_STORE_STATUS) { s390_store_status(cpu, S390_STORE_STATUS_DEF_ADDR, true); } env->sigp_order = 0; + if (s390_cpu_set_state(S390_CPU_STATE_STOPPED, cpu) == 0) { + qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN); + } env->pending_int &= ~INTERRUPT_STOP; } -- Thanks, David / dhildenb