On Tue, 2024-11-05 at 10:20 +0100, Uwe Kleine-König wrote: > Hello Nuno, > > On Mon, Nov 04, 2024 at 02:15:49PM +0100, Nuno Sá wrote: > > On Mon, 2024-11-04 at 13:49 +0100, Uwe Kleine-König wrote: > > > On Thu, Oct 31, 2024 at 01:05:21PM +0100, Nuno Sá wrote: > > > > On Thu, 2024-10-31 at 11:40 +0100, Uwe Kleine-König wrote: > > > > > On Wed, Oct 30, 2024 at 08:44:29PM +0000, Jonathan Cameron wrote: > > > > > > On Wed, 30 Oct 2024 14:04:58 +0100 > > > > > > Nuno Sá <noname.nuno@xxxxxxxxx> wrote: > > > > > > > Regarding this, I do share some of the concerns already raised by > > > > > > > Jonathan. > > > > > > > I fear > > > > > > > that we're papering around an issue with the IRQ controller rather > > > > > > > than > > > > > > > being an > > > > > > > issue with the device. When I look at irq_disable() docs [1], it > > > > > > > feels that > > > > > > > we're > > > > > > > already doing what we're supposed to do. IOW, we disable the lazy > > > > > > > approach > > > > > > > so we > > > > > > > *should* not get any pending IRQ. > > > > > > > > > > I think this is wrong and you always have to be prepared to see an irq > > > > > triggering that became pending while masked. > > > > > > I did some research, here are my findings: > > > > > > https://www.kernel.org/doc/html/v6.12-rc6/core-api/genericirq.html#delayed-interrupt-disable > > > reads: > > > > > > The interrupt is kept enabled and is masked in the flow handler > > > when an interrupt event happens. This prevents losing edge > > > interrupts on hardware which does not store an edge interrupt > > > event while the interrupt is disabled at the hardware level. > > > > > > This suggests that lazy disabling is needed for some controllers that > > > stop their event detection when disabled. I read that as: *Normally* an > > > irq event gets pending in hardware while the irq is disabled. > > > > I might be wrong, but I think that the lazy approach is the one for > > optimization. > > It's both. Needed for some controllers *and* an optimisation (that > isn't beneficial in some corner cases). > > > I think the reasoning is that __normally__ no more IRQs will come so > > no need to access the HW. But also thinking more on the subject and > > looking at what the lazy approach is doing, I take back what I said in > > previous emails. I *think* the expectation for a received IRQ while > > the line is masked (or disabled?!), is to keep it as pending (both on > > HW - when possible - and in SW). > > > > > The lazy disable approach is expected to work fine always, the reason to > > > implement non-lazy disabling is "only" a performance optimisation. See > > > commit e9849777d0e27cdd2902805be51da73e7c79578c. > > > > Not sure If I understood you correctly, but I think is the other way > > around? > > Also, as said in the commit, I think it also prevents the same interrupt > > from > > happening twice (in some cases). > > The conversation thread isn't complete on lore.kernel.org, so I don't > know for sure, but the way I understand it is: Normally while you handle > an irq no new irq comes in and so it's sensible to do lazy disabling. > Approximately: In 99.9 % of the cases you save 1 µs by not masking and > in the remaining 0.1% you get another hard irq that costs you say > 500 µs. So on average you save: 0.999 * 1 µs + 0.001 * (1 - 500) µs = 0.5 µs. > > However if for a certain device it's normal that another irq comes in > the "improvement" degrades to: In 20 % of the cases you save 1 µs and in > the remaining 80 % you get a penalty of 500 µs. So in this case it's not > an expected win anymore and you can better stop doing lazy disabling and > invest the time to mask the irq improving the average cost from > - 0.2 * 1 µs + 0.8 * 499 µs = 399.4 µs to 1 µs. > > The interrupt happening twice is not a problem for correctness as the > second one is not given to the device driver but caught in the irq > subsystem that only then disables the irq in hardware and marking it > pending for later consumption. It "only" costs cpu time. (And maybe it's > given to the driver twice after enable_irq is called?) Yeah, enable_irq() was what I meant. So, in the commit message, it's stated: "Unfortunately there are devices which do not allow the interrupt to be disabled easily at the device level. They are forced to use disable_irq_nosync(). This can result in taking each interrupt twice." And the taking twice part was something that I was not getting it. Still not 100% sure but I think that what is meant is that when we enable the IRQ we might get it through [1] and then afterwards through the IRQ controller for devices that latch the events (as soon as you unmask the line, the event should trigger). On [1], there's a retrigger path that goes through HW and I'm not sure if that one is problematic. But I would expect the SW one to be... [1]: https://elixir.bootlin.com/linux/v6.11.6/source/kernel/irq/chip.c#L283 - Nuno Sá > > Looking at the problem at hand with the shared DOUT/̅R̅D̅Y line: It's > (nearly?) 100% of the cases that the line toggles while the irq is > logically disabled/masked. So it makes sense to do > > irq_set_status_flags(sigma_delta->irq_line, IRQ_DISABLE_UNLAZY); > > which however should only have an effect iff the irq controller doesn't > miss an edge while the irq is disabled. > > So assuming my understanding is correct, there is something to do about > the raspberry pi gpio controller to prevent setting IRQ_DISABLE_UNLAZY > have an effect, because that one looses events. > > > > However that makes me wonder what is the difference between the > > > irq_mask() and irq_disable() callbacks defined in struct irq_chip. > > > > Wondering the same... > > > > Thanks for digging into this. This has been a long standing thing with sigma > > delta > > ADCs (I'm fairly sure this discussion about being an issue on the IRQ > > controller or > > not already happened before). > > I keep that paragraph in my reply because the question is still open > even though I don't add new infos here. > > Best regards > Uwe