Re: [PATCH] dax/bus.c: replace WARN_ON_ONCE() with lockdep asserts

Dan Williams <dan.j.williams@xxxxxxxxx> · Wed, 3 Apr 2024 19:33:04 -0700

Vishal Verma wrote:
> In [1], Dan points out that all of the WARN_ON_ONCE() usage in the
> referenced patch should be replaced with lockdep_assert_held(_write)().
> 
> Replace those, and additionally, replace a couple of other
> WARN_ON_ONCE() introduced in the same patch for actual failure
> cases (i.e. when acquiring a semaphore fails in a remove / unregister
> path) with dev_WARN_ONCE() as is the precedent here.

/me goes to look why failures are warned vs bubbling up the error...

> 
> Recall that previously, unregistration paths was implicitly protected by
> overloading the device lock, which the patch in [1] sought to remove.
> This meant adding a semaphore acquisition in these unregistration paths.
> Since that can fail, and it doesn't make sense to return errors from
> these paths, retain the two instances of (now) dev_WARN_ONCE().
> 
> Link: https://lore.kernel.org/r/65f0b5ef41817_aa222941a@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.notmuch [1]
> Fixes: c05ae9d85b47 ("dax/bus.c: replace driver-core lock usage by a local rwsem")
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Reported-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx>
> ---
>  drivers/dax/bus.c | 23 +++++++++++++----------
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 797e1ebff299..d89c8c3b62f7 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
[..]
> @@ -726,7 +728,8 @@ static void unregister_dax_mapping(void *data)

...and realizes that is hunk is confused.

>  	if (rwsem_is_locked(&dax_region_rwsem))
>  		return __unregister_dax_mapping(data);

This is bug. Either it is safe to call this without the lock held, or it
must be safe to always acquire the lock. Anything else means the caller
needs to be refactored. Conditional locking is an indicator of a deeper
problem with code organization.

Yes, this was not part of the fixup, but the changelog led me here
because no WARNs should remain at the end of this effort.

I think the confusion results from replace *all* device_lock() when some
device_lock() is still needed.

> -	if (WARN_ON_ONCE(down_write_killable(&dax_region_rwsem) != 0))
> +	if (dev_WARN_ONCE(data, down_write_killable(&dax_region_rwsem) != 0,
> +			  "unable to acquire region rwsem\n"))

In a context like this that cannot return an error directly to the user
process making the request, like in a sysfs operation handler, then the
locking cannot worry about signals. This would become an uncoditional
down_write() lock. So down_write_killable() is typically for request
contexts where there is a user process waiting for a result. For
contexts like async driver probing where we might be in a kernel thread,
and definitely in functions that return 'void', it is a bug to use
fallible locks.