Re: [PATCH 2/6] irqchip/armada-370-xp: Implement SoC Error interrupts

Marc Zyngier <maz@xxxxxxxxxx> · Sat, 07 May 2022 10:01:52 +0100

On Fri, 06 May 2022 19:55:46 +0100,
Pali Rohár <pali@xxxxxxxxxx> wrote:
> 
> On Friday 06 May 2022 19:47:25 Marc Zyngier wrote:
> > On Fri, 06 May 2022 19:30:51 +0100,
> > Pali Rohár <pali@xxxxxxxxxx> wrote:
> > > 
> > > On Friday 06 May 2022 19:19:46 Marc Zyngier wrote:
> > > > On Fri, 06 May 2022 14:40:25 +0100,
> > > > Pali Rohár <pali@xxxxxxxxxx> wrote:
> > > > > 
> > > > > +static void armada_370_xp_soc_err_irq_unmask(struct irq_data *d);
> > > > > +
> > > > >  static inline bool is_percpu_irq(irq_hw_number_t irq)
> > > > >  {
> > > > >  	if (irq <= ARMADA_370_XP_MAX_PER_CPU_IRQS)
> > > > > @@ -509,6 +517,27 @@ static void armada_xp_mpic_reenable_percpu(void)
> > > > >  		armada_370_xp_irq_unmask(data);
> > > > >  	}
> > > > >  
> > > > > +	/* Re-enable per-CPU SoC Error interrupts that were enabled before suspend */
> > > > > +	for (irq = 0; irq < soc_err_irq_num_regs * 32; irq++) {
> > > > > +		struct irq_data *data;
> > > > > +		int virq;
> > > > > +
> > > > > +		virq = irq_linear_revmap(armada_370_xp_soc_err_domain, irq);
> > > > > +		if (virq == 0)
> > > > > +			continue;
> > > > > +
> > > > > +		data = irq_get_irq_data(virq);
> > > > > +
> > > > > +		if (!irq_percpu_is_enabled(virq))
> > > > > +			continue;
> > > > > +
> > > > > +		armada_370_xp_soc_err_irq_unmask(data);
> > > > > +	}
> > > > 
> > > > So you do this loop and all these lookups, both here and in the resume
> > > > function (duplicated code!) just to be able to call the unmask
> > > > function?  This would be better served by two straight writes of the
> > > > mask register, which you'd conveniently save on suspend.
> > > > 
> > > > Yes, you have only duplicated the existing logic. But surely there is
> > > > something better to do.
> > > 
> > > Yes, I just used existing logic.
> > > 
> > > I'm not rewriting driver or doing big refactor of it, as this is not in
> > > the scope of the PCIe AER interrupt support.
> > 
> > Fair enough. By the same logic, I'm not taking any change to the
> > driver until it is put in a better shape. Your call.
> 
> If you are maintainer of this code then it is expected from _you_ to
> move the current code into _better shape_ as you wrote and expect. And
> then show us exactly, how new changes in this driver should look like,
> in examples.

Sorry, but that's not how this works. You are the one willing to
change a sub-par piece of code, you get to make it better. You
obviously have the means (the HW) and the incentive (these patches).
But you don't get to make something even more unmaintainable because
you're unwilling to do some extra work.

If you're unhappy with my position, that's fine. I suggest you take it
with Thomas, and maybe even Linus. As I suggested before, you can also
post a patch removing me as the irqchip maintainer. I'm sure that will
spark an interesting discussion.

> > > > > +static int armada_xp_soc_err_irq_set_affinity(struct irq_data *d,
> > > > > +					      const struct cpumask *mask,
> > > > > +					      bool force)
> > > > > +{
> > > > > +	unsigned int cpu;
> > > > > +
> > > > > +	cpus_read_lock();
> > > > > +
> > > > > +	/* First disable IRQ on all cores */
> > > > > +	for_each_online_cpu(cpu)
> > > > > +		smp_call_on_cpu(cpu, armada_370_xp_soc_err_irq_mask_on_cpu, d, true);
> > > > > +
> > > > > +	/* Select a single core from the affinity mask which is online */
> > > > > +	cpu = cpumask_any_and(mask, cpu_online_mask);
> > > > > +	smp_call_on_cpu(cpu, armada_370_xp_soc_err_irq_unmask_on_cpu, d, true);
> > > > > +
> > > > > +	cpus_read_unlock();
> > > > > +
> > > > > +	irq_data_update_effective_affinity(d, cpumask_of(cpu));
> > > > > +
> > > > > +	return IRQ_SET_MASK_OK;
> > > > > +}
> > > > 
> > > > Aren't these per-CPU interrupts anyway? What does it mean to set their
> > > > affinity? /me rolls eyes...
> > > 
> > > Yes, they are per-CPU interrupts. But to mask or unmask particular
> > > interrupt for specific CPU is possible only from that CPU. CPU 0 just
> > > cannot move interrupt from CPU 0 to CPU 1. CPU 0 can only mask that
> > > interrupt and CPU 1 has to unmask it.
> > 
> > And that's no different form other per-CPU interrupts that have the
> > exact same requirements. NAK to this sort of hacks.
> 
> You forgot to mention in your previous email how to do it, right? So we
> are waiting...

I didn't forget. I explained that it should be handled just like any
other per-CPU interrupt. There is plenty of example of how to do that
in the tree (timers, for example), and if you had even looked at it,
you'd have seen that your approach most probably results in an
arbitrary pointer dereference on anything but CPU0 because the
requesting driver knows nothing about per-CPU interrupts.

But you're obviously trying to make a very different point here. I'll
let you play that game for as long as you want, no skin off my nose.
Maybe in the future, you'll be more interested in actively
collaborating on the kernel code instead of throwing your toys out of
the pram.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.