Re: [PATCH v2] sched/all: Change all BUG_ON() instances in the scheduler to WARN_ON_ONCE()

Mel Gorman <mgorman@xxxxxxx> · Mon, 15 Aug 2022 15:41:43 +0100

On Fri, Aug 12, 2022 at 11:29:18AM +0200, Ingo Molnar wrote:
> From: Ingo Molnar <mingo@xxxxxxxxxx>
> Date: Thu, 11 Aug 2022 08:54:52 +0200
> Subject: [PATCH] sched/all: Change all BUG_ON() instances in the scheduler to WARN_ON_ONCE()
> 
> There's no good reason to crash a user's system with a BUG_ON(),
> chances are high that they'll never even see the crash message on
> Xorg, and it won't make it into the syslog either.
> 
> By using a WARN_ON_ONCE() we at least give the user a chance to report
> any bugs triggered here - instead of getting silent hangs.
> 
> None of these WARN_ON_ONCE()s are supposed to trigger, ever - so we ignore
> cases where a NULL check is done via a BUG_ON() and we let a NULL
> pointer through after a WARN_ON_ONCE().
> 
> There's one exception: WARN_ON_ONCE() arguments with side-effects,
> such as locking - in this case we use the return value of the
> WARN_ON_ONCE(), such as in:
> 
>  -       BUG_ON(!lock_task_sighand(p, &flags));
>  +       if (WARN_ON_ONCE(!lock_task_sighand(p, &flags)))
>  +               return;
> 
> Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> Link: https://lore.kernel.org/r/YvSsKcAXISmshtHo@xxxxxxxxx
> ---
>  kernel/sched/autogroup.c |  3 ++-
>  kernel/sched/core.c      |  2 +-
>  kernel/sched/cpupri.c    |  2 +-
>  kernel/sched/deadline.c  | 26 +++++++++++++-------------
>  kernel/sched/fair.c      | 10 +++++-----
>  kernel/sched/rt.c        |  2 +-
>  kernel/sched/sched.h     |  6 +++---
>  7 files changed, 26 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
> index fa9ce9d83683..a286e726eb4b 100644
> --- a/kernel/sched/cpupri.c
> +++ b/kernel/sched/cpupri.c
> @@ -147,7 +147,7 @@ int cpupri_find_fitness(struct cpupri *cp, struct task_struct *p,
>  	int task_pri = convert_prio(p->prio);
>  	int idx, cpu;
>  
> -	BUG_ON(task_pri >= CPUPRI_NR_PRIORITIES);
> +	WARN_ON_ONCE(task_pri >= CPUPRI_NR_PRIORITIES);
>  
>  	for (idx = 0; idx < task_pri; idx++) {
>  

Should the return value be used here to clamp task_pri to
CPUPRI_NR_PRIORITIES? task_pri is used for index which in __cpupri_find
then accesses beyond the end of an array and the future behaviour will be
very unpredictable.

> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0ab79d819a0d..962b169b05cf 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2017,7 +2017,7 @@ static struct task_struct *pick_task_dl(struct rq *rq)
>  		return NULL;
>  
>  	dl_se = pick_next_dl_entity(dl_rq);
> -	BUG_ON(!dl_se);
> +	WARN_ON_ONCE(!dl_se);
>  	p = dl_task_of(dl_se);
>  
>  	return p;

It's a somewhat redundant check, it'll NULL pointer dereference shortly
afterwards but it'll be a bit more obvious.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 914096c5b1ae..28f10dccd194 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2600,7 +2600,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
>  	if (!join)
>  		return;
>  
> -	BUG_ON(irqs_disabled());
> +	WARN_ON_ONCE(irqs_disabled());
>  	double_lock_irq(&my_grp->lock, &grp->lock);
>  
>  	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {

Recoverable with a goto no_join. It'll be a terrible recovery because
there is no way IRQs should be disabled here. Something else incredibly
bad happened before this would fire.

> @@ -7279,7 +7279,7 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
>  		return;
>  
>  	find_matching_se(&se, &pse);
> -	BUG_ON(!pse);
> +	WARN_ON_ONCE(!pse);
>  
>  	cse_is_idle = se_is_idle(se);
>  	pse_is_idle = se_is_idle(pse);

Similar to pick_task_dl.

> @@ -10134,7 +10134,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>  		goto out_balanced;
>  	}
>  
> -	BUG_ON(busiest == env.dst_rq);
> +	WARN_ON_ONCE(busiest == env.dst_rq);
>  
>  	schedstat_add(sd->lb_imbalance[idle], env.imbalance);
>  

goto out if it triggers? It'll just continue to be unbalanced.

> @@ -10430,7 +10430,7 @@ static int active_load_balance_cpu_stop(void *data)
>  	 * we need to fix it. Originally reported by
>  	 * Bjorn Helgaas on a 128-CPU setup.
>  	 */
> -	BUG_ON(busiest_rq == target_rq);
> +	WARN_ON_ONCE(busiest_rq == target_rq);
>  
>  	/* Search for an sd spanning us and the target CPU. */
>  	rcu_read_lock();

goto out_unlock if it fires?

For the rest, I didn't see obvious recovery paths that would allow the
system to run predictably. Any of them firing will have unpredictable
consequences (e.g. move_queued_task firing would be fun if it was a per-cpu
kthread). Depending on which warning triggers, the remaining life of the
system may be very short but maybe long enough to be logged even if system
locks up shortly afterwards.

-- 
Mel Gorman
SUSE Labs