On Thu, 24 Sep 2015, Qais Yousef wrote: > On 09/23/2015 05:54 PM, Jiang Liu wrote: > > Thanks for doing this, but the change is a little bigger than > > my expectation. Could we achieve this by: > > 1) extend irq_chip to support send_ipi operation > > 2) reuse existing irqdomain allocation interfaces to allocate IPI IRQ > > 3) arch code to create an IPI domain for IPI allocations > > 4) IRQ core provides some helpers to help arch code to implement IPI > > irqdomain That's not sufficient as IPIs are different from normal interrupts because we need an interface to actually send them. > Can you be more specific about 2 please? I tried to reuse the hierarchy > irqdomain alloc function. One major difference when allocating IPI than a > normal irq is that it's dynamic. The caller doesn't know what hwirq number it > needs. It actually shouldn't. Right. But we have the same behaviour with e.g. MSI. The caller does not know a hardware irq number because it is dynamically assigned. > The idea is for the user to just say 'I want an IPI to a CPUAFFINITY' from DT > and get a virq in return to send an IPI to the target CPU(s). Also I think we > need to accommodate the possibility of having more than 1 IPI controller. Having more than one IPI controller is not a problem. It's going to be a separate IPI domain, which you select from DT or other means. These IPI domains are implemented like the MSI domain as child domains of the underlying irq domain. [IPI domain] ---> [GIC domain] like we have on x86 [MSI domain] ---> [Vector domain] So you need some infrastructure, which allows you to: - allocate IPI(s) Requests IPI(s) from a IPI domain. That might be the default IPI domain or one that is matched via OF against a list of registered domains or one which is known to the caller by other means. Now that allocation interface does: 1) Allocate irq descriptor This is required even for IPIs which are targeted to coprocessors and cannot be requested from Linux. In that case the only purpose is to store the irq chip and the irq domain specific data for that virq/hwirq mapping and the irq is marked as NOREQUEST. 2) Allocate the vector/hwirq number block from the IPI domain Part of the allocation request info is a pointer to the target cpu mask. The weight of the target cpu mask is the number of hwirqs you need to allocate from the underlying domain. For a normal Linux IPI, this will be the number of possible CPUs. For a coprocessor IPI, this will be a single hwirq. We also store that target cpu mask for runtime validation and other usage in the irq descriptor data. We can actually reuse the existing affinity mask for that. Now how these hwirqs are allocated is a domain/architecture specific issue. x86 will just find a vector which is available on all target cpus and mark it as used. That's a single hw irq number. mips and others, which implement IPIs as regular hw interrupt numbers, will allocate a these (consecutive) hw interrupt numbers either from a reserved region or just from the regular space. That's a bunch of hw irq numbers and we need to come up with a proper storage format in the irqdata for that. That might be struct ipi_mapping { unsigned int nr_hwirqs; unsigned int cpumap[NR_CPUS]; }; or some other appropriate storage format like: struct ipi_mapping { unsigned int hwirq_base; unsigned int cpu_offset; unsigned int nr_hwirqs; }; which is less space consuming, but restricted to consecutive hwirqs which can be mapped to the cpu number linearly: hwirq = hwirq_base + cpu - cpu_offset; The result of this is a single virq number, which has all the necessary information stored in the associated irq descriptor and the domain specific hierarchical irq_data. For normal Linux IPIs that irq is marked as per cpu irq and can be requested via request_percpu_irq() and enabled/disabled via enable_percpu_irq/disable_percpu_irq on CPU hot[un]plug. - A function to send an IPI to a virq number That function takes the virq number and a target cpumask as argument. Actually we want two functions where the one which takes an virq number is a wrapper around the other which takes a irq descriptor pointer. The one which takes the virq number can be exported to drivers, the other one is a core/arch code only interface. The reason for this is that we want to avoid the irq descriptor lookup for regular IPIs, but for drivers this is a NONO. int irq_send_ipi(int virq, const struct cpumask *mask) { struct irq_desc *desc = irq_to_desc(virq); if (!desc) return -EINVAL; return irq_desc_send_ipi(desc, mask); } Along with a version which sends an IPI to all cpus in the target mask: int irq_send_ipi_all(int virq) { struct irq_desc *desc = irq_to_desc(virq); struct irq_data *data; if (!desc) return -EINVAL; data = irq_desc_get_irq_data(desc); return irq_desc_send_ipi(desc, irq_data_get_affinity_mask(data)); } And the internal function: int irq_desc_send_ipi(struct irq_desc *desc, const struct cpumask *mask) { struct irq_data *data = irq_desc_get_irq_data(desc); struct irq_chip *chip = irq_data_get_irq_chip(data); if (!chip || !chip->send_ipi) return -EINVAL; /* * Do not validate the mask for IPIs marked global. These are * regular IPIs so we can avoid the operation as their target * mask is the cpu_possible_mask. */ if (!irqd_is_global_ipi(data)) { if (!cpumask_subset(mask, irq_data_get_affinity_mask(data)) return -EINVAL; } chip->send_ipi(data, mask); return 0; } So now the chip specific send_ipi function will deal with the underlying implementation details. on x86 it uses the selected APIC implementation and sends the IPI to the vector stored in the hw irq number to all CPUs which are in the mask. on mips and others it's a bit different as you need to figure out the effective hwirq number for the cpus set in the target mask from the stored mapping in the hierarchical irq data. We certainly can create common helpers for this. Assume the simple mapping format: struct ipi_mapping { unsigned int nr_hwirqs; unsigned int cpumap[]; }; then a helper function for the IPI domain irq chip would be: void irq_chip_send_ipi(struct irq_data *data, const struct cpumask *mask) { struct ipi_mapping *map = irq_data_get_irq_chip_data(data); struct irq_data *parent = data->parent; unsigned int cpu, hwirq; for_each_cpu(cpu, mask) { hwirq = map->cpumap[cpu]; /* Deal with gaps */ if (hwirq == INVALID_HWIRQ) continue; parent->chip->send_ipi(parent, cpumask_of(cpu)); } } No linked lists, no magic other stuff. Just a natural extension to the existing hierarchical irq domain code, which can be reused by all architectures. Thanks, tglx