On Tue, 2021-11-23 at 19:44 +0100, David Hildenbrand wrote: > > > > I ALSO took a stab at folding this into the S390 IRQ paths [2], > > > > similar > > > > to what was done with kvm_s390_stop_info. This worked > > > > reasonably > > > > well, > > > > except the QEMU interface kvm_s390_vcpu_interrupt() returns a > > > > void, > > > > and > > > > so wouldn't notice an error sent back by KVM. Not a deal > > > > breaker, > > > > but > > > > having not heard anything to this idea, I didn't go much > > > > farther. > > > > > > We only care about SIGP STOP* handling so far, if anybody is > > > aware of > > > other issues > > > that need fixing, it would be helpful to spell them out. > > > > Yes, I keep using SIGP STOP* as an example because it offers (A) a > > clear miscommunication with the status bits returned by SIGP SENSE, > > and > > (B) has some special handling in QEMU that to my experience is > > still > > incomplete. But I'm seeing issues with other orders, like SIGP > > RESTART > > [1] [2] (where QEMU does a kvm_s390_vcpu_interrupt() with > > KVM_S390_RESTART, and thus adds to pending_irq) and SIGP (INITIAL) > > CPU > > RESET [2] (admittedly not greatly researched). > > Sorry I missed that discussion previously. Summarizing what the PoP > says: > > Once we have a pending: > * start (-> synchronous) > * stop (-> asynchronous) > * restart (-> asynchronous) > * stop-and-store-status (-> asynchronous) > * set-prefix (-> synchronous) > * store-status-at-address (-> synchronous) > * store-additional-status-at-address (-> synchronous) > * initial-CPU-reset (-> synchronous) > * CPU-reset order (-> synchronous) I would argue that these last two are not necessarily synchronous. Looking at QEMU... handle_sigp_single_dst() sigp_(initial_)cpu_reset() s390_cpu_reset() >> If kvm_enabled(), call kvm_s390_reset_vcpu_initial() -or- kvm_s390_reset_vcpu_normal() kvm_s390_reset_vcpu() kvm_vcpu_ioctl() (Switch to KVM code) kvm_vcpu_ioctl() -- Acquire vcpu->mutex kvm_arch_vcpu_ioctl() kvm_arch_vcpu_ioctl_initial_reset() -and/or- kvm_arch_vcpu_ioctl_normal_reset() So, given that, couldn't it be possible for a SIGP SENSE to be sent to a CPU that is currently processing one of the RESET orders? The mutex acquired as part of the ioctl would end up gating one of them, when according to POPS a subsequent SENSE should get CC2 until the reset completes. Unlike STOP*/RESTART, there's no IRQ that we can key off of to know that the RESET is still in process, unless we define another IOCTL as in v2-v4 here. > > The following orders have to return busy (my take: only a must if the > guest could observe the difference): > * sense > * external call > * emergency signal These ones: > * start > * stop > * restart > * stop and store status > * set prefix > * store status at address > * set architecture > * set multithreading > * store additional status at address ... are handled in userspace, which QEMU serializes in handle_sigp() and returns CC2 today: if (qemu_mutex_trylock(&qemu_sigp_mutex)) { ret = SIGP_CC_BUSY; goto out; } > > We have to ask ourselves > > a) How could a single VCPU observe the difference if executing any > other > instruction immediately afterwards. My take would be that for > synchronous orders we can't really. So we're left with: > * stop (-> asynchronous) > * restart (-> asynchronous) > * stop-and-store-status (-> asynchronous) > > b) How could multiple VCPUs observe the difference that a single VCPU > can't observe. That will require more thought, but I think it will be > hardly possible. > > > We know that SIGP STOP * needs care. > > SIGP RESTART is interesting. We inject it only for OPERATING cpus and > it > will only change the LC psw. What if we execute immediately > afterwards: > > * sense -> does not affect any bits > * external call -> has higher IRQ priority. There could be a > difference > if injected before or after the restart was delivered. Could be > fixed > in the kernel (check IRQ_PEND_RESTART). > * emergency signal -> has higher IRQ priority. There could be a > difference if injected before or after the restart was delivered. > Could be fixed in the kernel (check IRQ_PEND_RESTART). > * start -> CPU is already operating > * stop -> restart is delivered first > * restart -> I think the lowcore will look different if we inject two > RESTARTs immediately afterwards instead of only a single > one. Could be fixed in the kernel (double-deliver the interrupt). > * stop and store status -> restart is delivered first > * set prefix -> CPU is operating, not possible > * store status at address -> CPU is operating, not possible > * set architecture -> don't care > * set multithreading -> don't care > * store additional status at address -> CPU is operating, not > possible > * initial-CPU-reset -> clears local IRQ. LC will look different if > RESTART was delivered or not. Could be fixed in the kernel quite > easily (deliver restart first before clearing interrupts). > * CPU-reset -> clears local IRQs. LC will look different if > injected before vs. after. Could be fixed in the kernel quite > easily (deliver restart first before clearing interrupts).. These might be of value. I don't yet have a clear order of events in these scenarios, but will keep this in mind as I am seeing some oddities there. > > external call as handled by the SIGP interpretation facility will > already also violate that description. We don't know that a SIGP > restart > is pending. We'd have to disable the SIGP interpretation facility > temporarily. > > /me shivers /me too > > This sounds like the kind of things we should happily not be fixing > because nobody might really care :) > > Hi, me again, really hoping I don't care about this aspect of it. :) > > > The reason for why I have no spent a lot of time in the latter is > > that > > I have also mentioned that POPS has lists of orders that will > > return > > busy [3], and so something more general is perhaps warranted. The > > QEMU > > RFC's don't handle anything further than SIGP STOP*, on account of > > it > > makes sense to get the interface right first. > > Right. My take is to have a look what we actually have to fix -- > where > the guest can actually observe the difference. If the guest can't > observe the difference there is no need to actually implement BUSY > logic > as instructed in the PoP. My concern is largely with SIGP SENSE giving a clear answer for the state of the cpu. Since most of the SIGP orders have some variation of "may not occur/be completed during the execution of SIGNAL PROCESSOR" in POPs, the SIGP SENSE is a quick mechanism (being defined as a "fast" order) to determine if another order has completed or is still in- process, and if the cpu was left in the expected state at the completion of the order. It does suggest that those synchronous orders could offer a misleading answer, but since there's no (obvious) loss of control in those paths, it's not as big a concern for me. > > > > I'll keep assuming that > > > only SIGP STOP* needs fixing, as I already explained. > > > > > > Whenever QEMU tells a CPU to stop asynchronously, it does so via > > > a > > > STOP IRQ from > > > the destination CPU thread via KVM_S390_SIGP_STOP. Then, we know > > > the > > > CPU is busy > > > ... until we clear that interrupt, which happens via > > > kvm_s390_clear_stop_irq(). > > > > > > Interestingly, we clear that interrupt via two paths: > > > > > > 1) kvm_s390_clear_local_irqs(), via KVM_S390_INITIAL_RESET and > > > KVM_S390_NORMAL_RESET. Here, we expect that new user space > > > also > > > sets > > > the CPU to stopped via KVM_MP_STATE_STOPPED. In fact, modern > > > QEMU > > > properly sets the CPU stopped before triggering clearing of > > > the > > > interrupts (s390_cpu_reset()). > > > 2) kvm_s390_clear_stop_irq() via kvm_s390_vcpu_stop(), which is > > > triggered via: > > > > > > a) STOP intercept (handle_stop()), KVM_S390_INITIAL_RESET and > > > KVM_S390_NORMAL_RESET with old user space -- > > > !kvm_s390_user_cpu_state_ctrl(). > > > > > > b) KVM_MP_STATE_STOPPED for modern user space. > > > > > > > > > > > > Would the following solve our SIGP STOP * issue w.o. uapi > > > changes? > > > > > > > A quick pass shows some promise, but I haven't the bandwidth to > > throw > > the battery of stuff at it. I'll have to take a closer look after > > the > > US Holiday to give a better answer. (Example: looking for > > IRQ_PEND_SIGP_STOP || IRQ_PEND_RESTART is trivial.) > > Yes, extending to IRQ_PEND_RESTART would make sense. > Running my stressors at this combined patch goes well. Going to work on some additional ones this week, with some debug on the reset orders. Thanks, Eric