Re: [RFC PATCH v2 3/4] irq: Introduce IRQ_HANDLED_MANY

Leonardo Bras <leobras@xxxxxxxxxx> · Fri, 23 Feb 2024 01:37:39 -0300

On Wed, Feb 21, 2024 at 04:41:20PM +0100, Thomas Gleixner wrote:
> On Wed, Feb 21 2024 at 02:39, Leonardo Bras wrote:
> > On Mon, Feb 19, 2024 at 12:03:07PM +0100, Thomas Gleixner wrote:
> >> >> Is scenarios where there is no need to keep track of IRQ handled, convert
> >> >> it back to IRQ_HANDLED.
> >> >
> >> > That's not really workable as you'd have to update tons of drivers just
> >> > to deal with that corner case. That's error prone and just extra
> >> > complexity all over the place.
> >
> > I agree, that's a downside of this implementation. 
> 
> A serious one which is not really workable. See below.
> 
> > I agree the above may be able to solve the issue, but it would make 2 extra 
> > atomic ops necessary in the thread handling the IRQ, as well as one extra 
> > atomic operation in note_interrupt(), which could increase latency on this 
> > IRQ deferring the handler to a thread.
> >
> > I mean, yes, the cpu running note_interrupt() would probably already have 
> > exclusiveness for this cacheline, but it further increases cacheline 
> > bouncing and also adds the mem barriers that incur on atomic operations, 
> > even if we use an extra bit from threads_handled instead of allocate a new 
> > field for threads_running.
> 
> I think that's a strawman. Atomic operations can of course be more
> expensive than non-atomic ones, but they only start to make a difference
> when the cache line is contended. That's not the case here for the
> normal operations.
> 
> Interrupts and their threads are strictly targeted to a single CPU and
> the cache line is already hot and had to be made exclusive because of
> other write operations to it.
> 
> There is usually no concurrency at all, except for administrative
> operations like enable/disable or affinity changes. Those administrative
> operations are not high frequency and the resulting cache line bouncing
> is unavoidable even without that change. But does it matter in the
> larger picture? I don't think so.

That's a fair point, but there are some use cases that use CPU Isolation on 
top of PREEMPT_RT in order to reduce interference on a CPU running an RT 
workload.

For those cases, IIRC the handler will run on a different (housekeeping) 
CPU when those IRQs originate on an Isolated CPU, meaning the above 
described cacheline bouncing is expected.

> 
> > On top of that, let's think on a scenario where the threaded handler will 
> > solve a lot of requests, but not necessarily spend a lot of time doing so.
> > This allows the thread to run for little time while solving a lot of 
> > requests.
> >
> > In this scenario, note_interrupt() could return without incrementing 
> > irqs_unhandled for those IRQ that happen while the brief thread is running, 
> > but every other IRQ would cause note_interrupt() to increase 
> > irqs_unhandled, which would cause the bug to still reproduce.
> 
> In theory yes. Does it happen in practice?
> 
> But that exposes a flaw in the actual detection code. The code is
> unconditionally accumulating if there is an unhandled interrupt within
> 100ms after the last unhandled one. IOW, if there is a periodic
> unhandled one every 50ms, the interrupt will be shut down after 100000 *
> 50ms = 5000s ~= 83.3m ~= 1.4h. And it neither cares about the number of
> actually handled interrupts.
> 
> The spurious detector is really about runaway interrupts which hog a CPU
> completely, but the above is not what we want to protect against.

Now it makes a lot more sense to me.
Thanks!

> 
> > I understand my suggested change increases irq_return complexity, but it
> > should not increase much of the overhead in both IRQ deferring and IRQ 
> > thread handling. Also, since it accounts for every IRQ actually handled, it 
> > does not matter how long the handler thread runs, it still avoids the
> > bug.
> 
> I'm not worried about the extra complexity in note_interrupt(). That's
> fine and I don't have a strong opinion whether using your patch 2/4 or
> the simpler one I suggested.
> 
> > As you previously noted, the main issue in my suggestion is about changing 
> > drivers' code. But while this change is optional, I wonder how many 
> > drivers handle IRQs that:
> > - use edge type interrupts, and
> 
> It's not up to the driver to decide that. That's a platform property and
> the driver does not even know and they should not care because all what
> matters for the driver is that it gets the interrupts and does not lose
> anything.
> 
> > - can trigger really fast, many many times, and
> 
> That's a hardware or emulation property and I don't know how many
> devices would expose this behaviour.
> 
> > - can run in force-thread mode, and
> 
> Almost all drivers.
> 
> > - have handlers that are able to deal with multiple IRQs per call.
> 
> Pretty much every driver which has to deal with queues, FIFOs
> etc. That's a lot of them.
> 
> > About solving it directly in generic code, I agree it would be the ideal 
> > scenario. That's why I suggested that in RFCv1: if the thread handles a 
> > single IRQ, we reset irqs_unhandled to zero. That's a good pointer on the 
> > thread being stuck, and TBH I don't think this is too far away from the 
> > current 100/100k if we consider some of those handlers can handle multiple 
> > IRQs at once.
> 
> It has the downside that a scenario where there is a spurious interrupt
> flood with an occasional one inbetween which is handled by the thread
> will not be detected anymore. The accumulation and time period tracking
> are there for a reason.

Ok, fair point.
But if we disable it, we end up not having even those few successes.

But now that I understand the point is preventing the CPU from hogging, it 
makes sense: we prefer disabling the interrupt than compomising the system.
It also makes a warning that may help to track down the driver/device 
issue.

> 
> But as I pointed out above the detection logic is flawed due to the
> unconditional accumulation. Can you give the uncompiled below a test
> ride with your scenario?
> 
> Thanks,
> 
>         tglx
> ---
>  include/linux/irqdesc.h |    2 ++
>  kernel/irq/spurious.c   |   33 +++++++++++++++++++++++++++++----
>  2 files changed, 31 insertions(+), 4 deletions(-)
> 
> --- a/include/linux/irqdesc.h
> +++ b/include/linux/irqdesc.h
> @@ -29,6 +29,7 @@ struct pt_regs;
>   * @wake_depth:		enable depth, for multiple irq_set_irq_wake() callers
>   * @tot_count:		stats field for non-percpu irqs
>   * @irq_count:		stats field to detect stalled irqs
> + * @first_unhandled:	Timestamp of the first unhandled interrupt
>   * @last_unhandled:	aging timer for unhandled count
>   * @irqs_unhandled:	stats field for spurious unhandled interrupts
>   * @threads_handled:	stats field for deferred spurious detection of threaded handlers
> @@ -64,6 +65,7 @@ struct irq_desc {
>  	unsigned int		wake_depth;	/* nested wake enables */
>  	unsigned int		tot_count;
>  	unsigned int		irq_count;	/* For detecting broken IRQs */
> +	unsigned long		first_unhandled;
>  	unsigned long		last_unhandled;	/* Aging timer for unhandled count */
>  	unsigned int		irqs_unhandled;
>  	atomic_t		threads_handled;
> --- a/kernel/irq/spurious.c
> +++ b/kernel/irq/spurious.c
> @@ -390,9 +390,14 @@ void note_interrupt(struct irq_desc *des
>  		 * working systems
>  		 */
>  		if (time_after(jiffies, desc->last_unhandled + HZ/10))
> -			desc->irqs_unhandled = 1;
> -		else
> -			desc->irqs_unhandled++;
> +			desc->irqs_unhandled = 0;
> +
> +		if (!desc->irqs_unhandled) {
> +			desc->irq_count = 0;
> +			desc->first_unhandled = jiffies;
> +		}
> +
> +		desc->irqs_unhandled++;
>  		desc->last_unhandled = jiffies;
>  	}
>  
> @@ -411,9 +416,27 @@ void note_interrupt(struct irq_desc *des
>  	if (likely(desc->irq_count < 100000))
>  		return;
>  
> -	desc->irq_count = 0;
>  	if (unlikely(desc->irqs_unhandled > 99900)) {
>  		/*
> +		 * Validate that this is actually an interrupt storm, which
> +		 * happens to:
> +		 *
> +		 *  - increment the unhandled count to ~100k within 10 seconds
> +		 *    which means there is an spurious interrupt every 50us
> +		 *  - have an unhandled to handled ratio > 2
> +		 *
> +		 * Otherwise reset the detector and start over. This also
> +		 * covers the case where a threaded handler consumes
> +		 * several hard interrupts in one go which would otherwise
> +		 * accumulate to 99900 over time and trigger a false positive.
> +		 */
> +		unsigned long td = desc->last_unhandled - desc->first_unhandled;
> +		unsigned int handled = desc->irq_count - desc->irqs_unhandled;
> +
> +		if (td > 5 * HZ || desc->irqs_unhandled / handled < 3)
> +			goto out;
> +
> +		/*
>  		 * The interrupt is stuck
>  		 */
>  		__report_bad_irq(desc, action_ret);
> @@ -428,7 +451,9 @@ void note_interrupt(struct irq_desc *des
>  		mod_timer(&poll_spurious_irq_timer,
>  			  jiffies + POLL_SPURIOUS_IRQ_INTERVAL);
>  	}
> +out:
>  	desc->irqs_unhandled = 0;
> +	desc->irq_count = 0;
>  }
>  
>  bool noirqdebug __read_mostly;
> 

Sure, I will give it a try.

Now having a better understanding of what is expected here, I will also try 
to experiment a little here.

Thank you!
Leo