Re: [RFC PATCH v2 3/4] irq: Introduce IRQ_HANDLED_MANY

Leonardo Bras <leobras@xxxxxxxxxx> · Wed, 21 Feb 2024 02:39:25 -0300

On Mon, Feb 19, 2024 at 12:03:07PM +0100, Thomas Gleixner wrote:
> On Mon, Feb 19 2024 at 10:59, Thomas Gleixner wrote:

Hi Thomas, thanks for reviewing!

> > On Fri, Feb 16 2024 at 04:59, Leonardo Bras wrote:
> >
> >> In threaded IRQs, some irq handlers are able to handle many requests at a
> >> single run, but this is only accounted as a single IRQ_HANDLED when
> >> increasing threads_handled.
> >>
> >> In order to fix this, introduce IRQ_HANDLED_MANY, so the returned value of
> >> those IRQ handlers are able to signal that many IRQs got handled at that
> >> run.
> >>
> >> Is scenarios where there is no need to keep track of IRQ handled, convert
> >> it back to IRQ_HANDLED.
> >
> > That's not really workable as you'd have to update tons of drivers just
> > to deal with that corner case. That's error prone and just extra
> > complexity all over the place.

I agree, that's a downside of this implementation. 

> >
> > This really needs to be solved in the core code.
> 
> Something like the uncompiled below should do the trick.
> 
> Thanks,
> 
>         tglx
> ---
> --- a/include/linux/irqdesc.h
> +++ b/include/linux/irqdesc.h
> @@ -38,7 +38,8 @@ struct pt_regs;
>   * @affinity_notify:	context for notification of affinity changes
>   * @pending_mask:	pending rebalanced interrupts
>   * @threads_oneshot:	bitfield to handle shared oneshot threads
> - * @threads_active:	number of irqaction threads currently running
> + * @threads_active:	number of irqaction threads currently activated
> + * @threads_running:	number of irqaction threads currently running
>   * @wait_for_threads:	wait queue for sync_irq to wait for threaded handlers
>   * @nr_actions:		number of installed actions on this descriptor
>   * @no_suspend_depth:	number of irqactions on a irq descriptor with
> @@ -80,6 +81,7 @@ struct irq_desc {
>  #endif
>  	unsigned long		threads_oneshot;
>  	atomic_t		threads_active;
> +	atomic_t		threads_running;
>  	wait_queue_head_t       wait_for_threads;
>  #ifdef CONFIG_PM_SLEEP
>  	unsigned int		nr_actions;
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1194,9 +1194,11 @@ irq_forced_thread_fn(struct irq_desc *de
>  	local_bh_disable();
>  	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
>  		local_irq_disable();
> +	atomic_inc(&desc->threads_running);
>  	ret = action->thread_fn(action->irq, action->dev_id);
>  	if (ret == IRQ_HANDLED)
>  		atomic_inc(&desc->threads_handled);
> +	atomic_dec(&desc->threads_running);
>  
>  	irq_finalize_oneshot(desc, action);
>  	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> --- a/kernel/irq/spurious.c
> +++ b/kernel/irq/spurious.c
> @@ -350,6 +350,12 @@ void note_interrupt(struct irq_desc *des
>  				desc->threads_handled_last = handled;
>  			} else {
>  				/*
> +				 * Avoid false positives when there is
> +				 * actually a thread running.
> +				 */
> +				if (atomic_read(&desc->threads_running))
> +					return;
> +				/*
>  				 * None of the threaded handlers felt
>  				 * responsible for the last interrupt
>  				 *
> 

I agree the above may be able to solve the issue, but it would make 2 extra 
atomic ops necessary in the thread handling the IRQ, as well as one extra 
atomic operation in note_interrupt(), which could increase latency on this 
IRQ deferring the handler to a thread.

I mean, yes, the cpu running note_interrupt() would probably already have 
exclusiveness for this cacheline, but it further increases cacheline 
bouncing and also adds the mem barriers that incur on atomic operations, 
even if we use an extra bit from threads_handled instead of allocate a new 
field for threads_running. 

On top of that, let's think on a scenario where the threaded handler will 
solve a lot of requests, but not necessarily spend a lot of time doing so.
This allows the thread to run for little time while solving a lot of 
requests.

In this scenario, note_interrupt() could return without incrementing 
irqs_unhandled for those IRQ that happen while the brief thread is running, 
but every other IRQ would cause note_interrupt() to increase 
irqs_unhandled, which would cause the bug to still reproduce.

I understand my suggested change increases irq_return complexity, but it
should not increase much of the overhead in both IRQ deferring and IRQ 
thread handling. Also, since it accounts for every IRQ actually handled, it 
does not matter how long the handler thread runs, it still avoids the bug.

As you previously noted, the main issue in my suggestion is about changing 
drivers' code. But while this change is optional, I wonder how many 
drivers handle IRQs that:
- use edge type interrupts, and
- can trigger really fast, many many times, and
- can run in force-thread mode, and
- have handlers that are able to deal with multiple IRQs per call.

If there aren't many that met this requirement, I could try to tackle them 
as they become a problem.

About solving it directly in generic code, I agree it would be the ideal 
scenario. That's why I suggested that in RFCv1: if the thread handles a 
single IRQ, we reset irqs_unhandled to zero. That's a good pointer on the 
thread being stuck, and TBH I don't think this is too far away from the 
current 100/100k if we consider some of those handlers can handle multiple 
IRQs at once.

What are your thoughts on that?

Thanks!
Leo