Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Wed, 17 Mar 2021 14:12:41 +0100

On Wed, Mar 17, 2021 at 12:38:21PM -0000, tip-bot2 for Waiman Long wrote:
> The following commit has been merged into the locking/urgent branch of tip:
> 
> Commit-ID:     b058f2e4d0a70c060e21ed122b264e9649cad57f
> Gitweb:        https://git.kernel.org/tip/b058f2e4d0a70c060e21ed122b264e9649cad57f
> Author:        Waiman Long <longman@xxxxxxxxxx>
> AuthorDate:    Tue, 16 Mar 2021 11:31:18 -04:00
> Committer:     Ingo Molnar <mingo@xxxxxxxxxx>
> CommitterDate: Wed, 17 Mar 2021 09:56:46 +01:00
> 
> locking/ww_mutex: Treat ww_mutex_lock() like a trylock
> 
> It was found that running the ww_mutex_lock-torture test produced the
> following lockdep splat almost immediately:
> 
> [  103.892638] ======================================================
> [  103.892639] WARNING: possible circular locking dependency detected
> [  103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S      W
> [  103.892643] ------------------------------------------------------
> [  103.892643] lock_torture_wr/3234 is trying to acquire lock:
> [  103.892646] ffffffffc0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [  103.892660]
> [  103.892660] but task is already holding lock:
> [  103.892661] ffffffffc0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x3e2/0x720 [locktorture]
> [  103.892669]
> [  103.892669] which lock already depends on the new lock.
> [  103.892669]
> [  103.892670]
> [  103.892670] the existing dependency chain (in reverse order) is:
> [  103.892671]
> [  103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}:
> [  103.892675]        lock_acquire+0x1c5/0x830
> [  103.892682]        __ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [  103.892687]        ww_mutex_lock+0x4b/0x180
> [  103.892690]        torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [  103.892694]        lock_torture_writer+0x142/0x3a0 [locktorture]
> [  103.892698]        kthread+0x35f/0x430
> [  103.892701]        ret_from_fork+0x1f/0x30
> [  103.892706]
> [  103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}:
> [  103.892709]        lock_acquire+0x1c5/0x830
> [  103.892712]        __ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [  103.892715]        ww_mutex_lock+0x4b/0x180
> [  103.892717]        torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [  103.892721]        lock_torture_writer+0x142/0x3a0 [locktorture]
> [  103.892725]        kthread+0x35f/0x430
> [  103.892727]        ret_from_fork+0x1f/0x30
> [  103.892730]
> [  103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}:
> [  103.892733]        check_prevs_add+0x3fd/0x2470
> [  103.892736]        __lock_acquire+0x2602/0x3100
> [  103.892738]        lock_acquire+0x1c5/0x830
> [  103.892740]        __ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [  103.892743]        ww_mutex_lock+0x4b/0x180
> [  103.892746]        torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [  103.892749]        lock_torture_writer+0x142/0x3a0 [locktorture]
> [  103.892753]        kthread+0x35f/0x430
> [  103.892755]        ret_from_fork+0x1f/0x30
> [  103.892757]
> [  103.892757] other info that might help us debug this:
> [  103.892757]
> [  103.892758] Chain exists of:
> [  103.892758]   torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> torture_ww_mutex_0.base
> [  103.892758]
> [  103.892763]  Possible unsafe locking scenario:
> [  103.892763]
> [  103.892764]        CPU0                    CPU1
> [  103.892765]        ----                    ----
> [  103.892765]   lock(torture_ww_mutex_0.base);
> [  103.892767] 				      lock(torture_ww_mutex_1.base);
> [  103.892770] 				      lock(torture_ww_mutex_0.base);
> [  103.892772]   lock(torture_ww_mutex_2.base);
> [  103.892774]
> [  103.892774]  *** DEADLOCK ***
> 
> Since ww_mutex is supposed to be deadlock-proof if used properly, such
> deadlock scenario should not happen. To avoid this false positive splat,
> treat ww_mutex_lock() like a trylock().
> 
> After applying this patch, the locktorture test can run for a long time
> without triggering the circular locking dependency splat.
> 
> Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> Acked-by Davidlohr Bueso <dbueso@xxxxxxx>
> Link: https://lore.kernel.org/r/20210316153119.13802-4-longman@xxxxxxxxxx
> ---
>  kernel/locking/mutex.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 622ebdf..bb89393 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -946,7 +946,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	}
>  
>  	preempt_disable();
> -	mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
> +	/*
> +	 * Treat as trylock for ww_mutex.
> +	 */
> +	mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx, nest_lock, ip);

I'm confused... why isn't nest_lock working here?

For ww_mutex, we're supposed to have ctx->dep_map as a nest_lock, and
all lock acquisitions under a nest lock should be fine. Afaict the above
is just plain wrong.