>> >>> I ALSO took a stab at folding this into the S390 IRQ paths [2], >>> similar >>> to what was done with kvm_s390_stop_info. This worked reasonably >>> well, >>> except the QEMU interface kvm_s390_vcpu_interrupt() returns a void, >>> and >>> so wouldn't notice an error sent back by KVM. Not a deal breaker, >>> but >>> having not heard anything to this idea, I didn't go much farther. >> >> We only care about SIGP STOP* handling so far, if anybody is aware of >> other issues >> that need fixing, it would be helpful to spell them out. > > Yes, I keep using SIGP STOP* as an example because it offers (A) a > clear miscommunication with the status bits returned by SIGP SENSE, and > (B) has some special handling in QEMU that to my experience is still > incomplete. But I'm seeing issues with other orders, like SIGP RESTART > [1] [2] (where QEMU does a kvm_s390_vcpu_interrupt() with > KVM_S390_RESTART, and thus adds to pending_irq) and SIGP (INITIAL) CPU > RESET [2] (admittedly not greatly researched). Sorry I missed that discussion previously. Summarizing what the PoP says: Once we have a pending: * start (-> synchronous) * stop (-> asynchronous) * restart (-> asynchronous) * stop-and-store-status (-> asynchronous) * set-prefix (-> synchronous) * store-status-at-address (-> synchronous) * store-additional-status-at-address (-> synchronous) * initial-CPU-reset (-> synchronous) * CPU-reset order (-> synchronous) The following orders have to return busy (my take: only a must if the guest could observe the difference): * sense * external call * emergency signal * start * stop * restart * stop and store status * set prefix * store status at address * set architecture * set multithreading * store additional status at address We have to ask ourselves a) How could a single VCPU observe the difference if executing any other instruction immediately afterwards. My take would be that for synchronous orders we can't really. So we're left with: * stop (-> asynchronous) * restart (-> asynchronous) * stop-and-store-status (-> asynchronous) b) How could multiple VCPUs observe the difference that a single VCPU can't observe. That will require more thought, but I think it will be hardly possible. We know that SIGP STOP * needs care. SIGP RESTART is interesting. We inject it only for OPERATING cpus and it will only change the LC psw. What if we execute immediately afterwards: * sense -> does not affect any bits * external call -> has higher IRQ priority. There could be a difference if injected before or after the restart was delivered. Could be fixed in the kernel (check IRQ_PEND_RESTART). * emergency signal -> has higher IRQ priority. There could be a difference if injected before or after the restart was delivered. Could be fixed in the kernel (check IRQ_PEND_RESTART). * start -> CPU is already operating * stop -> restart is delivered first * restart -> I think the lowcore will look different if we inject two RESTARTs immediately afterwards instead of only a single one. Could be fixed in the kernel (double-deliver the interrupt). * stop and store status -> restart is delivered first * set prefix -> CPU is operating, not possible * store status at address -> CPU is operating, not possible * set architecture -> don't care * set multithreading -> don't care * store additional status at address -> CPU is operating, not possible * initial-CPU-reset -> clears local IRQ. LC will look different if RESTART was delivered or not. Could be fixed in the kernel quite easily (deliver restart first before clearing interrupts). * CPU-reset -> clears local IRQs. LC will look different if injected before vs. after. Could be fixed in the kernel quite easily (deliver restart first before clearing interrupts).. external call as handled by the SIGP interpretation facility will already also violate that description. We don't know that a SIGP restart is pending. We'd have to disable the SIGP interpretation facility temporarily. /me shivers This sounds like the kind of things we should happily not be fixing because nobody might really care :) > > The reason for why I have no spent a lot of time in the latter is that > I have also mentioned that POPS has lists of orders that will return > busy [3], and so something more general is perhaps warranted. The QEMU > RFC's don't handle anything further than SIGP STOP*, on account of it > makes sense to get the interface right first. Right. My take is to have a look what we actually have to fix -- where the guest can actually observe the difference. If the guest can't observe the difference there is no need to actually implement BUSY logic as instructed in the PoP. > >> I'll keep assuming that >> only SIGP STOP* needs fixing, as I already explained. >> >> Whenever QEMU tells a CPU to stop asynchronously, it does so via a >> STOP IRQ from >> the destination CPU thread via KVM_S390_SIGP_STOP. Then, we know the >> CPU is busy >> ... until we clear that interrupt, which happens via >> kvm_s390_clear_stop_irq(). >> >> Interestingly, we clear that interrupt via two paths: >> >> 1) kvm_s390_clear_local_irqs(), via KVM_S390_INITIAL_RESET and >> KVM_S390_NORMAL_RESET. Here, we expect that new user space also >> sets >> the CPU to stopped via KVM_MP_STATE_STOPPED. In fact, modern QEMU >> properly sets the CPU stopped before triggering clearing of the >> interrupts (s390_cpu_reset()). >> 2) kvm_s390_clear_stop_irq() via kvm_s390_vcpu_stop(), which is >> triggered via: >> >> a) STOP intercept (handle_stop()), KVM_S390_INITIAL_RESET and >> KVM_S390_NORMAL_RESET with old user space -- >> !kvm_s390_user_cpu_state_ctrl(). >> >> b) KVM_MP_STATE_STOPPED for modern user space. >> >> >> >> Would the following solve our SIGP STOP * issue w.o. uapi changes? >> > > A quick pass shows some promise, but I haven't the bandwidth to throw > the battery of stuff at it. I'll have to take a closer look after the > US Holiday to give a better answer. (Example: looking for > IRQ_PEND_SIGP_STOP || IRQ_PEND_RESTART is trivial.) Yes, extending to IRQ_PEND_RESTART would make sense. -- Thanks, David / dhildenb