On Wed, Aug 14 2024 at 14:45, Marek Behún wrote: Cc+ device tree people. > + The code binds this new interrupt domain to the same device-tree node as > + the main interrupt domain. The main interrupt controller has its > + interrupts described by one argument in device-tree > + (#interrupt-cells = <1>), i.e.: > + > + interrupts-extended = <&mpic 8>; > + > + Because of backwards compatibility we cannot change this number of > + arguments, and so the SoC Error interrupts must also be described by > + this one number. > + > + Thus, to describe a SoC Error interrupt, one has to add the an offset > + to the SoC Error interrupt number. Offset 0x400 was chosen because the > + main controller supports at most 1024 interrupts (in theory; in practice > + it seems to be 116 interrupts on all supported platforms). An example of > + describing a SoC Error interrupt is > + > + interrupts-extended = <&mpic 0x404>; This looks like a horrible hack and I don't understand why this can't be a separate interrupt controller, which it is in the hardware. That controller utilizes interrupt 4 from the MPIC. But then my DT foo is limited, so I let the DT folks comment on that. > +static int mpic_soc_err_irq_set_affinity(struct irq_data *d, const struct cpumask *mask, bool force) > +{ > + unsigned int cpu; > + > + /* > + * TODO: The mpic->per_cpu region accesses CPU Local IC registers for CPU n when accessed > + * from CPU n. Thus if we want to access this registers from another CPU, we need to request > + * a function to be executed on CPU n. This is what we do here by calling smp_call_on_cpu(). > + * > + * Instead, we could access CPU Local IC registers by having CPU Local region of each CPU > + * mapped in the MPIC private data structure. We could do this either by extending the > + * register resource in the device-tree, or by computing the physical base address of those > + * regions relative to the main MPIC base address. That requires locking for those registers obviously. > + */ > + > + cpus_read_lock(); This code was clearly never tested with any debug enabled. set_affinity() is invoked with interrupts disabled and irq_desc::lock held. cpus_read_lock() can sleep... The mandatory debug options would have told you loud and clearly. > + /* First, disable the ERR IRQ on all cores */ > + for_each_online_cpu(cpu) > + smp_call_on_cpu(cpu, mpic_soc_err_irq_mask_on_cpu, d, true); Again. smp_call_on_cpu() invokes wait_for_completion(), which obviously can sleep. Also why do you want to do that on _ALL_ CPUs if there is only one you pick from the effective affinity mask? > + /* Then enable on one online core from the affinity mask */ > + cpu = cpumask_any_and(mask, cpu_online_mask); > + smp_call_on_cpu(cpu, mpic_soc_err_irq_unmask_on_cpu, d, true); Ditto. So you really want to map the registers so they are accessible cross CPU including locking. Alternatively pin the error interrupts to CPU0 which cannot be unplugged and be done with it. > +static int mpic_soc_err_irq_map(struct irq_domain *domain, unsigned int virq, irq_hw_number_t hwirq) > +{ > + struct mpic *mpic = domain->host_data; > + > + irq_set_chip_data(virq, mpic); > + > + mpic_soc_err_irq_mask(irq_get_irq_data(virq)); What for? It should be masked if it's not mapped, no? > + irq_set_status_flags(virq, IRQ_LEVEL); > + irq_set_chip_and_handler(virq, &mpic_soc_err_irq_chip, handle_level_irq); > + irq_set_probe(virq); > + > + return 0; > +} > +static int mpic_soc_err_xlate(struct irq_domain *domain, struct device_node *np, > + const u32 *spec, unsigned int count, > + unsigned long *hwirq, unsigned int *type) > +{ > + int err = irq_domain_xlate_onecell(domain, np, spec, count, hwirq, type); > + > + if (err) > + return err; > + > + *hwirq -= MPIC_SOC_ERR_IRQS_OFFSET; > + return 0; > +} > +static int __init mpic_soc_err_init(struct mpic *mpic, struct device_node *np) > +{ > + unsigned int nr_irqs; > + > + if (of_machine_is_compatible("marvell,armada-370-xp")) > + nr_irqs = 32; > + else > + nr_irqs = 64; > + > + mpic->soc_err_domain = irq_domain_add_hierarchy(mpic->domain, 0, nr_irqs, np, > + &mpic_soc_err_irq_ops, mpic); Why is this a hierarchical domain? That does not make any sense at all. The MPIC domain provides only the demultiplexing interrupt and is not involved in a domain hierarchy. Hierarchical domains are required when the top level domain depends on resources from the parent domain which need to be allocated when mapping an interrupt. e.g. on x86: vector - remap - PCI/MSI So the top level PCI/MSI domain requires resources from the remap domain and the remap domain requires a vector from the vector domain. These resources are per interrupt. During runtime there are also irq_*() callbacks which utilize callbacks from the parent domains. I.e. mask() invokes mask(parent) ... But that has nothing to do with demultiplexed interrupts because the underlying demultiplex interrupt is already there and the same for all interrupts in the demultiplexed domain. There is no callback dependency and no resource dependency at all. Thanks, tglx