On 12.08.2016 14:43, Paolo Bonzini wrote: > On 12/08/2016 12:56, Alexander Popov wrote: >> Maybe the name "paravirq" is not very good, I'll try to describe the idea. >> >> There is some kernel module for special interactions between guest VMs. >> Currently it has to register a MSI-capable PCI device to handle interrupts >> injected by the hypervisor. And the bare-metal hypervisor has to emulate >> such a device for guest VMs. >> >> So I've implemented paravirq irq_domain to avoid this redundant emulation. >> With it we can just call: >> - paravirq_alloc_irq() to allocate a LAPIC irq; >> - request_irq() for it; >> - irqd_cfg(irq_get_irq_data()) to get the corresponding interrupt vector >> and inform the hypervisor about it. >> Now we happily handle the irq from the hypervisor when it injects this vector. >> >> The irq_mask/irq_unmask parameters of paravirq_init_chip() are the pointers >> to the functions from the interaction module which ask the hypervisor to >> start/stop injecting interrupts to the guest VM. >> >> Paravirq irq_domain allows to avoid the PCI device emulation in the hypervisor >> and provides the ability to run slimmer Linux guests without precompiled >> PCI and MSI support. >> >> Did I manage to answer your questions? > > It's a bit clearer. My doubt is that the caller of paravirq_init_chip > has to provide irq_mask and irq_unmask, but it doesn't know who will > call paravirq_alloc_irq. So there are two cases: > > 1) there is only one device, and then your solution doesn't scale well > to multiple devices > > 2) there is some kind of commonality between all devices using > paravirq_alloc_irq, and then it should be abstracted in a bus. > > The latter would be similar to what Xen and Hyper-V do, for example. > Using PCI is more similar to the KVM approach. Excuse me, I don't see the problem. The caller of paravirq_init_chip() provides irq_mask/irq_unmask function pointers only once, and paravirq_init_chip() saves them in .irq_mask/.irq_unmask fields of struct irq_chip paravirq_chip. When later, for example, disable_irq() is called for one of several irqs allocated in paravirq irq_domain, paravirq_chip->irq_mask() is called with struct irq_desc *desc argument corresponding to that particular irq. I.e. our irq_mask()/irq_unmask() callbacks get irq_desc of the interrupt which should be masked/unmasked and can ask the hypervisor to stop/start injecting the vector of that particular interrupt. >>>> Signed-off-by: Alexander Popov <alex.popov@xxxxxxxxx> >>>> --- >>>> arch/x86/Kconfig | 8 +++ >>>> arch/x86/include/asm/irqdomain.h | 6 ++ >>>> arch/x86/include/asm/paravirq.h | 9 +++ >>>> arch/x86/kernel/apic/Makefile | 2 + >>>> arch/x86/kernel/apic/paravirq.c | 128 >>>> +++++++++++++++++++++++++++++++++++++++ >>>> arch/x86/kernel/apic/vector.c | 1 + >>>> 6 files changed, 154 insertions(+) >>>> create mode 100644 arch/x86/include/asm/paravirq.h >>>> create mode 100644 arch/x86/kernel/apic/paravirq.c >>>> >>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>>> index 5c6e747..209bd88 100644 >>>> --- a/arch/x86/Kconfig >>>> +++ b/arch/x86/Kconfig >>>> @@ -760,6 +760,14 @@ config PARAVIRT_TIME_ACCOUNTING >>>> >>>> If in doubt, say N here. >>>> >>>> +config X86_PARAVIRQ >>>> + bool "Enable paravirq irq_domain" >>>> + depends on PARAVIRT && X86_LOCAL_APIC >>>> + default n >>>> + ---help--- >>>> + This option enables paravirq irq_domain for interrupts injected >>>> + by the hypervisor using Intel VT-x technology. >>>> + >>>> config PARAVIRT_CLOCK >>>> bool >>>> >>>> diff --git a/arch/x86/include/asm/irqdomain.h >>>> b/arch/x86/include/asm/irqdomain.h >>>> index d26075b..e3192f6 100644 >>>> --- a/arch/x86/include/asm/irqdomain.h >>>> +++ b/arch/x86/include/asm/irqdomain.h >>>> @@ -60,4 +60,10 @@ extern void arch_init_htirq_domain(struct irq_domain >>>> *domain); >>>> static inline void arch_init_htirq_domain(struct irq_domain *domain) { } >>>> #endif >>>> >>>> +#ifdef CONFIG_X86_PARAVIRQ >>>> +extern void arch_init_paravirq_domain(struct irq_domain *domain); >>>> +#else >>>> +static inline void arch_init_paravirq_domain(struct irq_domain *domain) { } >>>> +#endif >>>> + >>>> #endif >>>> diff --git a/arch/x86/include/asm/paravirq.h >>>> b/arch/x86/include/asm/paravirq.h >>>> new file mode 100644 >>>> index 0000000..a137de2 >>>> --- /dev/null >>>> +++ b/arch/x86/include/asm/paravirq.h >>>> @@ -0,0 +1,9 @@ >>>> +#ifndef _ASM_X86_PARAVIRQ_H >>>> +#define _ASM_X86_PARAVIRQ_H >>>> + >>>> +int paravirq_init_chip(void (*irq_mask)(struct irq_data *data), >>>> + void (*irq_unmask)(struct irq_data *data)); >>>> +int paravirq_alloc_irq(void); >>>> +void paravirq_free_irq(unsigned int irq); >>>> + >>>> +#endif /* _ASM_X86_PARAVIRQ_H */ >>>> diff --git a/arch/x86/kernel/apic/Makefile b/arch/x86/kernel/apic/Makefile >>>> index 8e63ebd..84f9ce0 100644 >>>> --- a/arch/x86/kernel/apic/Makefile >>>> +++ b/arch/x86/kernel/apic/Makefile >>>> @@ -28,3 +28,5 @@ obj-$(CONFIG_X86_BIGSMP) += bigsmp_32.o >>>> >>>> # For 32bit, probe_32 need to be listed last >>>> obj-$(CONFIG_X86_LOCAL_APIC) += probe_$(BITS).o >>>> + >>>> +obj-$(CONFIG_X86_PARAVIRQ) += paravirq.o >>>> diff --git a/arch/x86/kernel/apic/paravirq.c >>>> b/arch/x86/kernel/apic/paravirq.c >>>> new file mode 100644 >>>> index 0000000..430b819 >>>> --- /dev/null >>>> +++ b/arch/x86/kernel/apic/paravirq.c >>>> @@ -0,0 +1,128 @@ >>>> +/* >>>> + * An irq_domain for interrupts injected by the hypervisor using >>>> + * Intel VT-x technology. >>>> + * >>>> + * Copyright (C) 2016 Alexander Popov <alex.popov@xxxxxxxxx>. >>>> + * >>>> + * This file is released under the GPLv2. >>>> + */ >>>> + >>>> +#include <linux/init.h> >>>> +#include <linux/irq.h> >>>> +#include <asm/irqdomain.h> >>>> +#include <asm/paravirq.h> >>>> + >>>> +static struct irq_domain *paravirq_domain; >>>> + >>>> +static struct irq_chip paravirq_chip = { >>>> + .name = "PARAVIRQ", >>>> + .irq_ack = irq_chip_ack_parent, >>>> +}; >>>> + >>>> +static int paravirq_domain_alloc(struct irq_domain *domain, >>>> + unsigned int virq, unsigned int nr_irqs, void *arg) >>>> +{ >>>> + int ret = 0; >>>> + >>>> + BUG_ON(domain != paravirq_domain); >>>> + >>>> + if (nr_irqs != 1) >>>> + return -EINVAL; >>>> + >>>> + ret = irq_domain_set_hwirq_and_chip(paravirq_domain, >>>> + virq, virq, ¶virq_chip, NULL); >>>> + if (ret) { >>>> + pr_warn("setting chip, hwirq for irq %u failed\n", virq); >>>> + return ret; >>>> + } >>>> + >>>> + __irq_set_handler(virq, handle_edge_irq, 0, "edge"); >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static void paravirq_domain_free(struct irq_domain *domain, >>>> + unsigned int virq, unsigned int nr_irqs) >>>> +{ >>>> + struct irq_data *irq_data; >>>> + >>>> + BUG_ON(domain != paravirq_domain); >>>> + BUG_ON(nr_irqs != 1); >>>> + >>>> + irq_data = irq_domain_get_irq_data(paravirq_domain, virq); >>>> + if (irq_data) >>>> + irq_domain_reset_irq_data(irq_data); >>>> + else >>>> + pr_warn("irq %u is not in paravirq irq_domain\n", virq); >>>> +} >>>> + >>>> +static const struct irq_domain_ops paravirq_domain_ops = { >>>> + .alloc = paravirq_domain_alloc, >>>> + .free = paravirq_domain_free, >>>> +}; >>>> + >>>> +int paravirq_alloc_irq(void) >>>> +{ >>>> + struct irq_alloc_info info; >>>> + >>>> + if (!paravirq_domain) >>>> + return -ENODEV; >>>> + >>>> + if (!paravirq_chip.irq_mask || !paravirq_chip.irq_unmask) >>>> + return -EINVAL; >>>> + >>>> + init_irq_alloc_info(&info, NULL); >>>> + >>>> + return irq_domain_alloc_irqs(paravirq_domain, 1, NUMA_NO_NODE, &info); >>>> +} >>>> +EXPORT_SYMBOL(paravirq_alloc_irq); >>>> + >>>> +void paravirq_free_irq(unsigned int virq) >>>> +{ >>>> + struct irq_data *irq_data; >>>> + >>>> + if (!paravirq_domain) { >>>> + pr_warn("paravirq irq_domain is not initialized\n"); >>>> + return; >>>> + } >>>> + >>>> + irq_data = irq_domain_get_irq_data(paravirq_domain, virq); >>>> + if (irq_data) >>>> + irq_domain_free_irqs(virq, 1); >>>> + else >>>> + pr_warn("irq %u is not in paravirq irq_domain\n", virq); >>>> +} >>>> +EXPORT_SYMBOL(paravirq_free_irq); >>>> + >>>> +int paravirq_init_chip(void (*irq_mask)(struct irq_data *data), >>>> + void (*irq_unmask)(struct irq_data *data)) >>>> +{ >>>> + if (!paravirq_domain) >>>> + return -ENODEV; >>>> + >>>> + if (paravirq_chip.irq_mask || paravirq_chip.irq_unmask) >>>> + return -EEXIST; >>>> + >>>> + if (!irq_mask || !irq_unmask) >>>> + return -EINVAL; >>>> + >>>> + paravirq_chip.irq_mask = irq_mask; >>>> + paravirq_chip.irq_unmask = irq_unmask; >>>> + >>>> + return 0; >>>> +} >>>> +EXPORT_SYMBOL(paravirq_init_chip); >>>> + >>>> +void arch_init_paravirq_domain(struct irq_domain *parent) >>>> +{ >>>> + paravirq_domain = irq_domain_add_tree(NULL, ¶virq_domain_ops, NULL); >>>> + if (!paravirq_domain) { >>>> + pr_warn("failed to initialize paravirq irq_domain\n"); >>>> + return; >>>> + } >>>> + >>>> + paravirq_domain->name = paravirq_chip.name; >>>> + paravirq_domain->parent = parent; >>>> + paravirq_domain->flags |= IRQ_DOMAIN_FLAG_AUTO_RECURSIVE; >>>> +} >>>> + >>>> diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c >>>> index 6066d94..878b440 100644 >>>> --- a/arch/x86/kernel/apic/vector.c >>>> +++ b/arch/x86/kernel/apic/vector.c >>>> @@ -438,6 +438,7 @@ int __init arch_early_irq_init(void) >>>> >>>> arch_init_msi_domain(x86_vector_domain); >>>> arch_init_htirq_domain(x86_vector_domain); >>>> + arch_init_paravirq_domain(x86_vector_domain); >>>> >>>> BUG_ON(!alloc_cpumask_var(&vector_cpumask, GFP_KERNEL)); >>>> BUG_ON(!alloc_cpumask_var(&vector_searchmask, GFP_KERNEL)); >>>> -- >>>> 2.5.5 >>>> >>>> >> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html