On Tue, Mar 4, 2025 at 5:46 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > On Tue, Mar 04 2025 at 13:48, Tsai Sung-Fu wrote: > > On Mon, Mar 3, 2025 at 5:10 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > >> > + hwirq = ctrl * MAX_MSI_IRQS_PER_CTRL; > >> > + end = hwirq + MAX_MSI_IRQS_PER_CTRL; > >> > + for_each_set_bit_from(hwirq, pp->msi_irq_in_use, end) { > >> > + if (hwirq == hwirq_to_check) > >> > + continue; > >> > + virq = irq_find_mapping(pp->irq_domain, hwirq); > >> > + if (!virq) > >> > + continue; > >> > + mask = irq_get_affinity_mask(virq); > >> > >> What protects @mask against a concurrent modification? > > > > We hold the &pp->lock in the dw_pci_msi_set_affinity before calling > > this function > > so that should help to protect against concurrent modification ? > > So that's the same domain, right? > Yes, all from pp->irq_domain allocated by the driver by calling to irq_domain_create_linear(). I will add the lockdep_assert_held() check. > >> > + /* > >> > + * Update all the irq_data's effective mask > >> > + * bind to this MSI controller, so the correct > >> > + * affinity would reflect on > >> > + * /proc/irq/XXX/effective_affinity > >> > + */ > >> > + hwirq = ctrl * MAX_MSI_IRQS_PER_CTRL; > >> > + end = hwirq + MAX_MSI_IRQS_PER_CTRL; > >> > + for_each_set_bit_from(hwirq, pp->msi_irq_in_use, end) { > >> > + virq_downstream = irq_find_mapping(pp->irq_domain, hwirq); > >> > + if (!virq_downstream) > >> > + continue; > >> > + desc_downstream = irq_to_desc(virq_downstream); > >> > + irq_data_update_effective_affinity(&desc_downstream->irq_data, > >> > + effective_mask); > >> > >> Same here. > > > > We hold the &pp->lock in the dw_pci_msi_set_affinity before calling > > here, so that could help I think ? > > A comment would be helpful for the casual reader. Sure, will also add the lockdep_assert_held() check > > >> > >> > + } > >> > +} > >> > + > >> > +static int dw_pci_msi_set_affinity(struct irq_data *d, > >> > + const struct cpumask *mask, bool force) > >> > +{ > >> > + struct dw_pcie_rp *pp = irq_data_get_irq_chip_data(d); > >> > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); > >> > + int ret; > >> > + int virq_parent; > >> > + unsigned long hwirq = d->hwirq; > >> > + unsigned long flags, ctrl; > >> > + struct irq_desc *desc_parent; > >> > + const struct cpumask *effective_mask; > >> > + cpumask_var_t mask_result; > >> > + > >> > + ctrl = hwirq / MAX_MSI_IRQS_PER_CTRL; > >> > + if (!alloc_cpumask_var(&mask_result, GFP_ATOMIC)) > >> > + return -ENOMEM; > >> > >> This does not work on a RT enabled kernel. Allocations with a raw spin > >> lock held are not possible. > >> > > > > Even with the GFP_ATOMIC flag ? I thought that means it would be safe > > to do allocation in the atomic context ? > > Didn't work on the RT kernel before, so please enlighten me if I am > > wrong and if you don't mind. > > https://www.kernel.org/doc/html/latest/locking/locktypes.html#preempt-rt-caveats > https://www.kernel.org/doc/html/latest/locking/locktypes.html#raw-spinlock-t-on-rt > I will remove the Allocation and move the cpumask to the device structure. Thanks for the doc. > >> > + /* > >> > + * Loop through all possible MSI vector to check if the > >> > + * requested one is compatible with all of them > >> > + */ > >> > + raw_spin_lock_irqsave(&pp->lock, flags); > >> > + cpumask_copy(mask_result, mask); > >> > + ret = dw_pci_check_mask_compatibility(pp, ctrl, hwirq, mask_result); > >> > + if (ret) { > >> > + dev_dbg(pci->dev, "Incompatible mask, request %*pbl, irq num %u\n", > >> > + cpumask_pr_args(mask), d->irq); > >> > + goto unlock; > >> > + } > >> > + > >> > + dev_dbg(pci->dev, "Final mask, request %*pbl, irq num %u\n", > >> > + cpumask_pr_args(mask_result), d->irq); > >> > + > >> > + virq_parent = pp->msi_irq[ctrl]; > >> > + desc_parent = irq_to_desc(virq_parent); > >> > + ret = desc_parent->irq_data.chip->irq_set_affinity(&desc_parent->irq_data, > >> > + mask_result, force); > >> > >> Again. Completely unserialized. > >> > > > > The reason why we remove the desc_parent->lock protection is because > > the chained IRQ structure didn't export parent IRQ to the user space, so we > > think this call path should be the only caller. And since we have &pp->lock hold > > at the beginning, so that should help to protect against concurrent > > modification here ? > > "Should be the only caller" is not really a technical argument. If you > make assumptions like this, then you have to come up with a proper > explanation why this is correct under all circumstances and with all > possible variants of parent interrupt chips. > > Aside of that fiddling with the internals of interrupt descriptors is > not really justified here. What's wrong with using the existing > irq_set_affinity() mechanism? > > Thanks, > > tglx > Using irq_set_affinity() would give us some circular deadlock warning at the probe stage. irq_startup() flow would hold the global lock from irq_setup_affinity(), so deadlock scenario like this below might happen. The irq_set_affinity() would try to acquire &irq_desc_lock_class while the dw_pci_msi_set_affinity() already hold the &pp->lock. To resolve that we would like to reuse the idea to replace the global lock in irq_set_affinity() with PERCPU structure from this patch -> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64b6d1d7a84538de34c22a6fc92a7dcc2b196b64 Just would like to know if this looks feasible to you ? Chain exists of: &irq_desc_lock_class --> mask_lock --> &pp->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&pp->lock); lock(mask_lock); lock(&pp->lock); lock(&irq_desc_lock_class); *** DEADLOCK *** Thanks