On 12/08/2016 12:56, Alexander Popov wrote: > Maybe the name "paravirq" is not very good, I'll try to describe the idea. > > There is some kernel module for special interactions between guest VMs. > Currently it has to register a MSI-capable PCI device to handle interrupts > injected by the hypervisor. And the bare-metal hypervisor has to emulate > such a device for guest VMs. > > So I've implemented paravirq irq_domain to avoid this redundant emulation. > With it we can just call: > - paravirq_alloc_irq() to allocate a LAPIC irq; > - request_irq() for it; > - irqd_cfg(irq_get_irq_data()) to get the corresponding interrupt vector > and inform the hypervisor about it. > Now we happily handle the irq from the hypervisor when it injects this vector. > > The irq_mask/irq_unmask parameters of paravirq_init_chip() are the pointers > to the functions from the interaction module which ask the hypervisor to > start/stop injecting interrupts to the guest VM. > > Paravirq irq_domain allows to avoid the PCI device emulation in the hypervisor > and provides the ability to run slimmer Linux guests without precompiled > PCI and MSI support. > > Did I manage to answer your questions? It's a bit clearer. My doubt is that the caller of paravirq_init_chip has to provide irq_mask and irq_unmask, but it doesn't know who will call paravirq_alloc_irq. So there are two cases: 1) there is only one device, and then your solution doesn't scale well to multiple devices 2) there is some kind of commonality between all devices using paravirq_alloc_irq, and then it should be abstracted in a bus. The latter would be similar to what Xen and Hyper-V do, for example. Using PCI is more similar to the KVM approach. Paolo > Please correct me if the idea is wrong or there's a better way to do that. > Thanks. > >>> Signed-off-by: Alexander Popov <alex.popov@xxxxxxxxx> >>> --- >>> arch/x86/Kconfig | 8 +++ >>> arch/x86/include/asm/irqdomain.h | 6 ++ >>> arch/x86/include/asm/paravirq.h | 9 +++ >>> arch/x86/kernel/apic/Makefile | 2 + >>> arch/x86/kernel/apic/paravirq.c | 128 >>> +++++++++++++++++++++++++++++++++++++++ >>> arch/x86/kernel/apic/vector.c | 1 + >>> 6 files changed, 154 insertions(+) >>> create mode 100644 arch/x86/include/asm/paravirq.h >>> create mode 100644 arch/x86/kernel/apic/paravirq.c >>> >>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>> index 5c6e747..209bd88 100644 >>> --- a/arch/x86/Kconfig >>> +++ b/arch/x86/Kconfig >>> @@ -760,6 +760,14 @@ config PARAVIRT_TIME_ACCOUNTING >>> >>> If in doubt, say N here. >>> >>> +config X86_PARAVIRQ >>> + bool "Enable paravirq irq_domain" >>> + depends on PARAVIRT && X86_LOCAL_APIC >>> + default n >>> + ---help--- >>> + This option enables paravirq irq_domain for interrupts injected >>> + by the hypervisor using Intel VT-x technology. >>> + >>> config PARAVIRT_CLOCK >>> bool >>> >>> diff --git a/arch/x86/include/asm/irqdomain.h >>> b/arch/x86/include/asm/irqdomain.h >>> index d26075b..e3192f6 100644 >>> --- a/arch/x86/include/asm/irqdomain.h >>> +++ b/arch/x86/include/asm/irqdomain.h >>> @@ -60,4 +60,10 @@ extern void arch_init_htirq_domain(struct irq_domain >>> *domain); >>> static inline void arch_init_htirq_domain(struct irq_domain *domain) { } >>> #endif >>> >>> +#ifdef CONFIG_X86_PARAVIRQ >>> +extern void arch_init_paravirq_domain(struct irq_domain *domain); >>> +#else >>> +static inline void arch_init_paravirq_domain(struct irq_domain *domain) { } >>> +#endif >>> + >>> #endif >>> diff --git a/arch/x86/include/asm/paravirq.h >>> b/arch/x86/include/asm/paravirq.h >>> new file mode 100644 >>> index 0000000..a137de2 >>> --- /dev/null >>> +++ b/arch/x86/include/asm/paravirq.h >>> @@ -0,0 +1,9 @@ >>> +#ifndef _ASM_X86_PARAVIRQ_H >>> +#define _ASM_X86_PARAVIRQ_H >>> + >>> +int paravirq_init_chip(void (*irq_mask)(struct irq_data *data), >>> + void (*irq_unmask)(struct irq_data *data)); >>> +int paravirq_alloc_irq(void); >>> +void paravirq_free_irq(unsigned int irq); >>> + >>> +#endif /* _ASM_X86_PARAVIRQ_H */ >>> diff --git a/arch/x86/kernel/apic/Makefile b/arch/x86/kernel/apic/Makefile >>> index 8e63ebd..84f9ce0 100644 >>> --- a/arch/x86/kernel/apic/Makefile >>> +++ b/arch/x86/kernel/apic/Makefile >>> @@ -28,3 +28,5 @@ obj-$(CONFIG_X86_BIGSMP) += bigsmp_32.o >>> >>> # For 32bit, probe_32 need to be listed last >>> obj-$(CONFIG_X86_LOCAL_APIC) += probe_$(BITS).o >>> + >>> +obj-$(CONFIG_X86_PARAVIRQ) += paravirq.o >>> diff --git a/arch/x86/kernel/apic/paravirq.c >>> b/arch/x86/kernel/apic/paravirq.c >>> new file mode 100644 >>> index 0000000..430b819 >>> --- /dev/null >>> +++ b/arch/x86/kernel/apic/paravirq.c >>> @@ -0,0 +1,128 @@ >>> +/* >>> + * An irq_domain for interrupts injected by the hypervisor using >>> + * Intel VT-x technology. >>> + * >>> + * Copyright (C) 2016 Alexander Popov <alex.popov@xxxxxxxxx>. >>> + * >>> + * This file is released under the GPLv2. >>> + */ >>> + >>> +#include <linux/init.h> >>> +#include <linux/irq.h> >>> +#include <asm/irqdomain.h> >>> +#include <asm/paravirq.h> >>> + >>> +static struct irq_domain *paravirq_domain; >>> + >>> +static struct irq_chip paravirq_chip = { >>> + .name = "PARAVIRQ", >>> + .irq_ack = irq_chip_ack_parent, >>> +}; >>> + >>> +static int paravirq_domain_alloc(struct irq_domain *domain, >>> + unsigned int virq, unsigned int nr_irqs, void *arg) >>> +{ >>> + int ret = 0; >>> + >>> + BUG_ON(domain != paravirq_domain); >>> + >>> + if (nr_irqs != 1) >>> + return -EINVAL; >>> + >>> + ret = irq_domain_set_hwirq_and_chip(paravirq_domain, >>> + virq, virq, ¶virq_chip, NULL); >>> + if (ret) { >>> + pr_warn("setting chip, hwirq for irq %u failed\n", virq); >>> + return ret; >>> + } >>> + >>> + __irq_set_handler(virq, handle_edge_irq, 0, "edge"); >>> + >>> + return 0; >>> +} >>> + >>> +static void paravirq_domain_free(struct irq_domain *domain, >>> + unsigned int virq, unsigned int nr_irqs) >>> +{ >>> + struct irq_data *irq_data; >>> + >>> + BUG_ON(domain != paravirq_domain); >>> + BUG_ON(nr_irqs != 1); >>> + >>> + irq_data = irq_domain_get_irq_data(paravirq_domain, virq); >>> + if (irq_data) >>> + irq_domain_reset_irq_data(irq_data); >>> + else >>> + pr_warn("irq %u is not in paravirq irq_domain\n", virq); >>> +} >>> + >>> +static const struct irq_domain_ops paravirq_domain_ops = { >>> + .alloc = paravirq_domain_alloc, >>> + .free = paravirq_domain_free, >>> +}; >>> + >>> +int paravirq_alloc_irq(void) >>> +{ >>> + struct irq_alloc_info info; >>> + >>> + if (!paravirq_domain) >>> + return -ENODEV; >>> + >>> + if (!paravirq_chip.irq_mask || !paravirq_chip.irq_unmask) >>> + return -EINVAL; >>> + >>> + init_irq_alloc_info(&info, NULL); >>> + >>> + return irq_domain_alloc_irqs(paravirq_domain, 1, NUMA_NO_NODE, &info); >>> +} >>> +EXPORT_SYMBOL(paravirq_alloc_irq); >>> + >>> +void paravirq_free_irq(unsigned int virq) >>> +{ >>> + struct irq_data *irq_data; >>> + >>> + if (!paravirq_domain) { >>> + pr_warn("paravirq irq_domain is not initialized\n"); >>> + return; >>> + } >>> + >>> + irq_data = irq_domain_get_irq_data(paravirq_domain, virq); >>> + if (irq_data) >>> + irq_domain_free_irqs(virq, 1); >>> + else >>> + pr_warn("irq %u is not in paravirq irq_domain\n", virq); >>> +} >>> +EXPORT_SYMBOL(paravirq_free_irq); >>> + >>> +int paravirq_init_chip(void (*irq_mask)(struct irq_data *data), >>> + void (*irq_unmask)(struct irq_data *data)) >>> +{ >>> + if (!paravirq_domain) >>> + return -ENODEV; >>> + >>> + if (paravirq_chip.irq_mask || paravirq_chip.irq_unmask) >>> + return -EEXIST; >>> + >>> + if (!irq_mask || !irq_unmask) >>> + return -EINVAL; >>> + >>> + paravirq_chip.irq_mask = irq_mask; >>> + paravirq_chip.irq_unmask = irq_unmask; >>> + >>> + return 0; >>> +} >>> +EXPORT_SYMBOL(paravirq_init_chip); >>> + >>> +void arch_init_paravirq_domain(struct irq_domain *parent) >>> +{ >>> + paravirq_domain = irq_domain_add_tree(NULL, ¶virq_domain_ops, NULL); >>> + if (!paravirq_domain) { >>> + pr_warn("failed to initialize paravirq irq_domain\n"); >>> + return; >>> + } >>> + >>> + paravirq_domain->name = paravirq_chip.name; >>> + paravirq_domain->parent = parent; >>> + paravirq_domain->flags |= IRQ_DOMAIN_FLAG_AUTO_RECURSIVE; >>> +} >>> + >>> diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c >>> index 6066d94..878b440 100644 >>> --- a/arch/x86/kernel/apic/vector.c >>> +++ b/arch/x86/kernel/apic/vector.c >>> @@ -438,6 +438,7 @@ int __init arch_early_irq_init(void) >>> >>> arch_init_msi_domain(x86_vector_domain); >>> arch_init_htirq_domain(x86_vector_domain); >>> + arch_init_paravirq_domain(x86_vector_domain); >>> >>> BUG_ON(!alloc_cpumask_var(&vector_cpumask, GFP_KERNEL)); >>> BUG_ON(!alloc_cpumask_var(&vector_searchmask, GFP_KERNEL)); >>> -- >>> 2.5.5 >>> >>> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html