On 2012-10-24 02:50, Paul Mackerras wrote: > On Tue, Oct 23, 2012 at 12:48:28PM +0200, Jan Kiszka wrote: > >> The current irqchip API is like this: >> >> KVM_CREATE_IRQCHIP (without any parameters) >> ... >> KVM_CREATE_VCPU >> KVM_SET_IRQCHIP (or the other way around) >> ... >> KVM_RUN >> >> The arguments you cannot pass via KVM_CREATE_IRQCHIP - which is more >> like a "Hey, there will be an IRQ chip!" - could be passed via >> KVM_SET_IRQCHIP (it has 512 bytes space). Provided there are sane >> configuration defaults, at least after KVM_CREATE_VCPU, KVM_SET_IRQCHIP >> becomes optional. Then you don't need the a check on KVM_RUN. > > Interesting. How many times do you call KVM_CREATE_IRQCHIP per VM? > Just once? Yes, just once. The model is that we switch between user space and kernel space emulation (or hardware-assisted virtualization) of the irqchip(s). > >> What do we need in addition to this in any of the non-x86 archs? > > What we have with the XICS, and to some extent with the OpenPIC, is a > separation between "source" and "presentation" controllers, with a > message-passing fabric between them. The source controllers handle > the details of some number of interrupt sources, such as the priority > and destination of each interrupt source, and the presentation > controllers handle the interface to the CPUs, so there is one > presentation controller per CPU. The presentation controller for a > CPU has registers for the CPU priority, IPI request priority, and > pending interrupt status. > > So we could indeed use the existing KVM_CREATE_IRQCHIP to tell KVM to > create a presentation controller per vcpu. But then how do we tell > KVM how many source controllers we want and how many interrupts each > source controller should handle? The source controllers are not tied > to any particular vcpu, and a source controller could potentially have > 100s of interrupts or more (particularly with MSIs). Configuration of > each source fits into 64 bits, so if we tried to use KVM_SET_IRQCHIP > for configuring a source controller we'd be limited to 64 interrupts > per source controller. > > What I have in my patches to do XICS emulation in the kernel is a new > KVM_CREATE_IRQCHIP_ARGS ioctl, which takes an argument struct with a > type, and for source controllers, an identifying number ("bus unit ID" > or BUID, since that's what the hardware calls it) and the number of > sources. Then for saving/restoring the presentation controller state > there's a register identifier for the KVM_GET/SET_ONE_REG ioctls, and > for the source controllers there are new KVM_IRQCHIP_GET/SET_SOURCES > ioctls that take an argument struct like this: > > struct kvm_irq_sources { > __u32 start_irq_number; > __u32 nr_irqs; > __u64 *irqbuf; > }; > > OpenPIC also can handle 100s or 1000s of interrupt sources and can > have the sources divided up into blocks (which tends to make it > desirable to have multiple source controllers). It also has per-CPU > interrupt delivery registers and per-source interrupt source > registers. > > So I think all this could be shoehorned into KVM_CREATE/GET/SET_IRQCHIP > for small configurations, but it seems like it would run out of space > for larger configurations. Our architectures are not that different. We'll have the same challenge on x86 one day as well as there can be several IOAPICs (source controllers), not just one as today. Those should be addressed via chip_id of struct kvm_irqchip (we have a 32-bit address space there). Also there the question is when to instantiate the chips. Without adding another IOCTL, they could be created on first SET_IRQCHIP. For Power, the number of IRQ lines can become a set-once field in the source controller state, i.e. must never be written twice with different values. But, of course, some KVM_CREATE_IRQCHIP[2|_ARGS] that takes extra arguments and specifies those details is also be an option. There the question is how often it should be called: once with a list of all necessary parameters or multiple times as in your model. As I'd like to see a new IOCTL being able to replace the old one (though we will still support it for older user space, of course), I'm leaning more toward the first option. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html