> On May 31, 2023, at 1:11 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > On Wed, May 31 2023 at 15:06, Chuck Lever III wrote: >>> On May 31, 2023, at 10:43 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: >>> >>> mlx5_irq_alloc(af_desc) >>> pci_msix_alloc_irq_at(af_desc) >>> msi_domain_alloc_irq_at(af_desc) >>> __msi_domain_alloc_irqs(af_desc) >>> 1) msidesc->affinity = kmemdup(af_desc); >>> __irq_domain_alloc_irqs() >>> __irq_domain_alloc_irqs(aff=msidesc->affinity) >>> irq_domain_alloc_irqs_locked(aff) >>> irq_domain_alloc_irqs_locked(aff) >>> irq_domain_alloc_descs(aff) >>> alloc_desc(mask=&aff->mask) >>> desc_smp_init(mask) >>> 2) cpumask_copy(desc->irq_common_data.affinity, mask); >>> irq_domain_alloc_irqs_hierarchy() >>> msi_domain_alloc() >>> intel_irq_remapping_alloc() >>> x86_vector_alloc_irqs() >> >> It is x86_vector_alloc_irqs() where the struct irq_data is >> fabricated that ends up in irq_matrix_reserve_managed(). > > Kinda fabricated :) > > irqd = irq_domain_get_irq_data(domain, virq + i); > > Thats finding the irqdata which is associated to the vector domain. That > has been allocated earlier. The affinity mask is retrieved via: > > const struct cpumask *affmsk = irq_data_get_affinity_mask(irqd); > > which does: > > return irqd->common->affinity; > > irqd->common points to desc->irq_common_data. The affinity there was > copied in #2 above. > >>> This also ends up in the wrong place. That mlx code does: >>> >>> af_desc.is_managed = false; >>> >>> but the allocation ends up allocating a managed vector. >> >> That line was changed in 6.4-rc4 to address another bug, >> and it avoids the crash by not calling into the misbehaving >> code. It doesn't address the mlx5_core initialization issue >> though, because as I said before, execution continues and >> crashes in a similar scenario later on. > > Ok. > >> On my system, I've reverted that fix: >> >> - af_desc.is_managed = false; >> + af_desc.is_managed = 1; >> >> so that we can continue debugging this crash. > > Ah. > >>> Can you please instrument this along the call chain so we can see where >>> or at least when this gets corrupted? Please print the relevant pointer >>> addresses too so we can see whether that's consistent or not. >> >> I will continue working on this today. >>> But that's just the symptom, not the root cause. That code is perfectly >>> fine when all callers use the proper cpumask functions. >> >> Agreed: we're crashing here because of the extra bits >> in the affinity mask, but those bits should not be set >> in the first place. > > Correct. > >> I wasn't sure if for_each_cpu() was supposed to iterate >> into non-present CPUs -- and I guess the answer >> is yes, it will iterate the full length of the mask. >> The caller is responsible for ensuring the mask is valid. > > Yes, that's the assumption of this constant optimization for the small > number of CPUs case. All other cases use nr_cpu_ids as limit and won't > go into non-possible CPUs. I didn't spot it yesterday night either. This addresses the problem for me with both is_managed = 1 and is_managed = false: diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c index db5687d9fec9..bcf5df316c8f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c @@ -570,11 +570,11 @@ int mlx5_irqs_request_vectors(struct mlx5_core_dev *dev, u16 *cpus, int nirqs, af_desc.is_managed = false; for (i = 0; i < nirqs; i++) { + cpumask_clear(&af_desc.mask); cpumask_set_cpu(cpus[i], &af_desc.mask); irq = mlx5_irq_request(dev, i + 1, &af_desc, rmap); if (IS_ERR(irq)) break; - cpumask_clear(&af_desc.mask); irqs[i] = irq; } If you agree this looks reasonable, I can package it with a proper patch description and send it to Eli and Saeed. -- Chuck Lever