On 2/7/19 3:51 AM, David Gibson wrote: > On Wed, Feb 06, 2019 at 08:35:24AM +0100, Cédric Le Goater wrote: >> On 2/6/19 2:18 AM, David Gibson wrote: >>> On Wed, Feb 06, 2019 at 09:13:15AM +1100, Paul Mackerras wrote: >>>> On Tue, Feb 05, 2019 at 12:31:28PM +0100, Cédric Le Goater wrote: >>>>>>>> As for nesting, I suggest for the foreseeable future we stick to XICS >>>>>>>> emulation in nested guests. >>>>>>> >>>>>>> ok. so no kernel_irqchip at all. hmm. >>>>> >>>>> I was confused with what Paul calls 'XICS emulation'. It's not the QEMU >>>>> XICS emulated device but the XICS-over-XIVE KVM device, the KVM XICS >>>>> device KVM uses when under a P9 processor. >>>> >>>> Actually there are two separate implementations of XICS emulation in >>>> KVM. The first (older) one is almost entirely a software emulation >>>> but does have some cases where it accesses an underlying XICS device >>>> in order to make some things faster (IPIs and pass-through of a device >>>> interrupt to a guest). The other, newer one is the XICS-on-XIVE >>>> emulation that Ben wrote, which uses the XIVE hardware pretty heavily. >>>> My patch was about making the the older code work when there is no >>>> XICS available to the host. >>> >>> Ah, right. To clarify my earlier statements in light of this: >>> >>> * We definitely want some sort of kernel-XICS available in a nested >>> guest. AIUI, this is now accomplished, so, Yay! >>> >>> * Implementing the L2 XICS in terms of L1's PAPR-XIVE would be a >>> bonus, but it's a much lower priority. >> >> Yes. In this case, the L1 KVM-HV should not advertise KVM_CAP_PPC_IRQ_XIVE >> to QEMU which will restrict CAS to the XICS only interrupt mode. > > Uh... no... we shouldn't change what's available to the guest based on > host configuration only. We should just stop advertising the CAP > saying that *KVM implemented* is available yes. that is what I meant. > so that qemu will fall back to userspace XIVE emulation. even if kernel_irqchip is required ? Today, QEMU just fails to start. With the dual mode, the interrupt mode is negotiated at CAS time and when merged, the KVM device will be created at reset. In case of failure, QEMU will abort. I am not saying it is not possible but we will need some internal infrastructure to handle dynamically the fall back to userspace emulation. C.