Re: RFC: dma_resv_unlock() and ww_acquire_ctx scope

Christian König <christian.koenig@xxxxxxx> · Thu, 19 Jan 2023 12:44:35 +0100

Am 19.01.23 um 12:12 schrieb Daniel Vetter:
On Thu, Jan 19, 2023 at 11:37:39AM +0100, Maarten Lankhorst wrote:
On 2023-01-19 11:24, Paul Cercueil wrote:
Hi,

Just a reflexion I have after an intensive (and intense) debugging
session.

I had the following code:

int my_dma_resv_lock(struct dma_buf *dmabuf)
{
	struct ww_acquire_ctx ctx;
	int ret;

	ww_acquire_init(&ctx, &reservation_ww_class);

	ret = dma_resv_lock_interruptible(dmabuf->resv, &ctx);
	if (ret) {
		if (ret != -EDEADLK)
			return ret;

		ret = dma_resv_lock_slow_interruptible(dmabuf->resv,
						       &ctx);
	}

	return ret;
}

Then I would eventually unlock the dma_resv object in the caller
function. What made me think this was okay, was that the API itself
suggests it's okay - as dma_resv_unlock() does not take the
"ww_acquire_ctx" object as argument, my assumption was that after the
dma_resv was locked, the variable could go out of scope.

I wonder if it would be possible to change the API a little bit, so
that it is less prone to errors like this. Maybe by doing a struct copy
of the initialized ctx into the dma_resv object (if at all possible),
so that the original can actually go out of scope, or maybe having
dma_resv_unlock() take the ww_acquire_ctx as argument, even if it is
not actually used in the function body - just to make it obvious that
it is needed all the way to the point where the dma_resv is unlocked.

This email doesn't have to result in anything, I just thought I'd share
one point where I feel the API could be made less error-prone.
Hey,

This example code will fail eventually. If you have DEBUG_LOCK_ALLOC
enabled, a fake lock is inited in ww_acquire_init. If you don't free it
using ww_acquire_fini(), lockdep will see that you free a live lock that was
never released. PROVE_LOCKING will also complain that you never unlocked the
ctx fake lock.

If you do call ww_acquire_fini, you will get a loud complain if you never
released all locks, because ctx->acquired is non-zero.

The problem isn't that dma_resv_unlock() doesn't need the ctx, the 
problem is that in this example the ctx object isn't properly cleaned up 
and used.

As long as you only need to grab one lock the ctx should be NULL. Only 
when you grab multiple locks the ctx is used to make sure that you can 
detect and handle deadlocks between those different locks.

So using the ctx correctly also makes the lifetime of that object much 
more clear since it needs to stay around at least until all locks are taken.

Try with PROVE_LOCKING, your example will receive a lockdep splat. :)
Also CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y any time you change ww code please.
Otherwise it's just not fully tested.

Yeah, completely agree to both.

Christian.

-Daniel