Re: [PATCH v2 3/4] iio: adc: ad_sigma_delta: Add support for reading irq status using a GPIO

Nuno Sá <noname.nuno@xxxxxxxxx> · Tue, 05 Nov 2024 11:30:58 +0100

On Tue, 2024-11-05 at 10:20 +0100, Uwe Kleine-König wrote:
> Hello Nuno,
> 
> On Mon, Nov 04, 2024 at 02:15:49PM +0100, Nuno Sá wrote:
> > On Mon, 2024-11-04 at 13:49 +0100, Uwe Kleine-König wrote:
> > > On Thu, Oct 31, 2024 at 01:05:21PM +0100, Nuno Sá wrote:
> > > > On Thu, 2024-10-31 at 11:40 +0100, Uwe Kleine-König wrote:
> > > > > On Wed, Oct 30, 2024 at 08:44:29PM +0000, Jonathan Cameron wrote:
> > > > > > On Wed, 30 Oct 2024 14:04:58 +0100
> > > > > > Nuno Sá <noname.nuno@xxxxxxxxx> wrote:
> > > > > > > Regarding this, I do share some of the concerns already raised by
> > > > > > > Jonathan.
> > > > > > > I fear
> > > > > > > that we're papering around an issue with the IRQ controller rather
> > > > > > > than
> > > > > > > being an
> > > > > > > issue with the device. When I look at irq_disable() docs [1], it
> > > > > > > feels that
> > > > > > > we're
> > > > > > > already doing what we're supposed to do. IOW, we disable the lazy
> > > > > > > approach
> > > > > > > so we
> > > > > > > *should* not get any pending IRQ.
> > > > > 
> > > > > I think this is wrong and you always have to be prepared to see an irq
> > > > > triggering that became pending while masked.
> > > 
> > > I did some research, here are my findings:
> > > 
> > > https://www.kernel.org/doc/html/v6.12-rc6/core-api/genericirq.html#delayed-interrupt-disable
> > > reads:
> > > 
> > > 	The interrupt is kept enabled and is masked in the flow handler
> > > 	when an interrupt event happens. This prevents losing edge
> > > 	interrupts on hardware which does not store an edge interrupt
> > > 	event while the interrupt is disabled at the hardware level.
> > > 
> > > This suggests that lazy disabling is needed for some controllers that
> > > stop their event detection when disabled. I read that as: *Normally* an
> > > irq event gets pending in hardware while the irq is disabled.
> > 
> > I might be wrong, but I think that the lazy approach is the one for
> > optimization.
> 
> It's both. Needed for some controllers *and* an optimisation (that
> isn't beneficial in some corner cases).
> 
> > I think the reasoning is that __normally__ no more IRQs will come so
> > no need to access the HW. But also thinking more on the subject and
> > looking at what the lazy approach is doing, I take back what I said in
> > previous emails. I *think* the expectation for a received IRQ while
> > the line is masked (or disabled?!), is to keep it as pending (both on
> > HW - when possible - and in SW).
> > 
> > > The lazy disable approach is expected to work fine always, the reason to
> > > implement non-lazy disabling is "only" a performance optimisation. See
> > > commit e9849777d0e27cdd2902805be51da73e7c79578c.
> > 
> > Not sure If I understood you correctly, but I think is the other way
> > around? 
> > Also, as said in the commit, I think it also prevents the same interrupt
> > from
> > happening twice (in some cases).
> 
> The conversation thread isn't complete on lore.kernel.org, so I don't
> know for sure, but the way I understand it is: Normally while you handle
> an irq no new irq comes in and so it's sensible to do lazy disabling.
> Approximately: In 99.9 % of the cases you save 1 µs by not masking and
> in the remaining 0.1% you get another hard irq that costs you say
> 500 µs. So on average you save: 0.999 * 1 µs + 0.001 * (1 - 500) µs = 0.5 µs.
> 
> However if for a certain device it's normal that another irq comes in
> the "improvement" degrades to: In 20 % of the cases you save 1 µs and in
> the remaining 80 % you get a penalty of 500 µs. So in this case it's not
> an expected win anymore and you can better stop doing lazy disabling and
> invest the time to mask the irq improving the average cost from
> - 0.2 * 1 µs + 0.8 * 499 µs = 399.4 µs to 1 µs.
> 
> The interrupt happening twice is not a problem for correctness as the
> second one is not given to the device driver but caught in the irq
> subsystem that only then disables the irq in hardware and marking it
> pending for later consumption. It "only" costs cpu time. (And maybe it's
> given to the driver twice after enable_irq is called?)

Yeah, enable_irq() was what I meant. So, in the commit message, it's stated:

"Unfortunately there are devices which do not allow the interrupt to be
disabled easily at the device level. They are forced to use
disable_irq_nosync(). This can result in taking each interrupt twice."

And the taking twice part was something that I was not getting it. Still not
100% sure but I think that what is meant is that when we enable the IRQ we might
get it through [1] and then afterwards through the IRQ controller for devices
that latch the events (as soon as you unmask the line, the event should
trigger). On [1], there's a retrigger path that goes through HW and I'm not sure
if that one is problematic. But I would expect the SW one to be...

[1]: https://elixir.bootlin.com/linux/v6.11.6/source/kernel/irq/chip.c#L283

- Nuno Sá
> 
> Looking at the problem at hand with the shared DOUT/̅R̅D̅Y line: It's
> (nearly?) 100% of the cases that the line toggles while the irq is
> logically disabled/masked. So it makes sense to do
> 
> 	irq_set_status_flags(sigma_delta->irq_line, IRQ_DISABLE_UNLAZY);
> 
> which however should only have an effect iff the irq controller doesn't
> miss an edge while the irq is disabled.
> 
> So assuming my understanding is correct, there is something to do about
> the raspberry pi gpio controller to prevent setting IRQ_DISABLE_UNLAZY
> have an effect, because that one looses events.
> 
> > > However that makes me wonder what is the difference between the
> > > irq_mask() and irq_disable() callbacks defined in struct irq_chip.
> > 
> > Wondering the same...
> > 
> > Thanks for digging into this. This has been a long standing thing with sigma
> > delta
> > ADCs (I'm fairly sure this discussion about being an issue on the IRQ
> > controller or
> > not already happened before).
> 
> I keep that paragraph in my reply because the question is still open
> even though I don't add new infos here.
> 
> Best regards
> Uwe