On 2/6/19 2:23 AM, David Gibson wrote: > On Tue, Feb 05, 2019 at 01:55:40PM +0100, Cédric Le Goater wrote: >> On 2/5/19 6:28 AM, David Gibson wrote: >>> On Mon, Feb 04, 2019 at 12:30:39PM +0100, Cédric Le Goater wrote: >>>> On 2/4/19 5:45 AM, David Gibson wrote: >>>>> On Mon, Jan 07, 2019 at 07:43:18PM +0100, Cédric Le Goater wrote: >>>>>> This will let the guest create a memory mapping to expose the ESB MMIO >>>>>> regions used to control the interrupt sources, to trigger events, to >>>>>> EOI or to turn off the sources. >>>>>> >>>>>> Signed-off-by: Cédric Le Goater <clg@xxxxxxxx> >>>>>> --- >>>>>> arch/powerpc/include/uapi/asm/kvm.h | 4 ++ >>>>>> arch/powerpc/kvm/book3s_xive_native.c | 97 +++++++++++++++++++++++++++ >>>>>> 2 files changed, 101 insertions(+) >>>>>> >>>>>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h >>>>>> index 8c876c166ef2..6bb61ba141c2 100644 >>>>>> --- a/arch/powerpc/include/uapi/asm/kvm.h >>>>>> +++ b/arch/powerpc/include/uapi/asm/kvm.h >>>>>> @@ -675,4 +675,8 @@ struct kvm_ppc_cpu_char { >>>>>> #define KVM_XICS_PRESENTED (1ULL << 43) >>>>>> #define KVM_XICS_QUEUED (1ULL << 44) >>>>>> >>>>>> +/* POWER9 XIVE Native Interrupt Controller */ >>>>>> +#define KVM_DEV_XIVE_GRP_CTRL 1 >>>>>> +#define KVM_DEV_XIVE_GET_ESB_FD 1 >>>>> >>>>> Introducing a new FD for ESB and TIMA seems overkill. Can't you get >>>>> to both with an mmap() directly on the xive device fd? Using the >>>>> offset to distinguish which one to map, obviously. >>>> >>>> The page offset would define some sort of user API. It seems feasible. >>>> But I am not sure this would be practical in the future if we need to >>>> tune the length. >>> >>> Um.. why not? I mean, yes the XIVE supports rather a lot of >>> interrupts, but we have 64-bits of offset we can play with - we can >>> leave room for billions of ESB slots and still have room for billions >>> of VPs. >> >> So the first 4 pages could be the TIMA pages and then would come >> the pages for the interrupt ESBs. I think that we can have different >> vm_fault handler for each mapping. > > Um.. no, I'm saying you don't need to tightly pack them. So you could > have the ESB pages at 0, the TIMA at, say offset 2^60. Well, we know that the TIMA is 4 pages wide and is "directly" related with the KVM interrupt device. So being at offset 0 seems a good idea. While the ESB segment is of a variable size depending on the number of IRQs and it can come after I think. >> I wonder how this will work out with pass-through. As Paul said in >> a previous email, it would be better to let QEMU request a new >> mapping to handle the ESB pages of the device being passed through. >> I guess this is not a special case, just another offset and length. > > Right, if we need multiple "chunks" of ESB pages we can given them > each their own terabyte or several. No need to be stingy with address > space. You can not put them anywhere. They should map the same interrupt range of ESB pages, overlapping with the underlying segment of IPI ESB pages. C.