Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

Danilo Krummrich <dakr@xxxxxxxxxx> · Thu, 6 Jul 2023 17:06:08 +0200

Hi Boris,

On 6/30/23 10:02, Boris Brezillon wrote:
Hi Danilo,

On Fri, 30 Jun 2023 00:25:18 +0200
Danilo Krummrich <dakr@xxxxxxxxxx> wrote:

+ *	int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx)
+ *	{
+ *		struct driver_context *ctx = __ctx;
+ *
+ *		drm_gpuva_remap(ctx->prev_va, ctx->next_va, &op->remap);
+ *
+ *		drm_gpuva_unlink(op->remap.unmap->va);
+ *		kfree(op->remap.unmap->va);
+ *
+ *		if (op->remap.prev) {
+ *			drm_gpuva_link(ctx->prev_va);

I ended up switching to dma_resv-based locking for the GEMs and I
wonder what the locking is supposed to look like in the async-mapping
case, where we insert/remove the VA nodes in the drm_sched::run_job()
path.

If you decide to pick the interface where you just call 
drm_gpuva_sm_[un]map() and receive a callback for each operation it 
takes to fulfill the request, you probably do this because you want to 
do everything one shot, updating the VA space, link/unlink GPUVAs 
to/from its corresponding backing GEMs, do the actual GPU mappings.

This has a few advantages over generating a list of operations when the 
job is submitted. You've pointed out one of them, when you noticed that 
with a list of operations one can't sneak in a synchronous job between 
already queued up asynchronous jobs.

However, for the asynchronous path it has the limitation that the 
dma-resv lock can't be used to link/unlink GPUVAs to/from its 
corresponding backing GEMs, since this would happen in the fence 
signalling critical path and we're not allowed to hold the dma-resv lock 
there. Hence, as we discussed I added the option for drivers to provide 
an external lock for that, just to be able to keep some lockdep checks.

What I have right now is something like:

	dma_resv_lock(vm->resv);

	// split done in drm_gpuva_sm_map(), each iteration
	// of the loop is a call to the driver ->[re,un]map()
	// hook
	for_each_sub_op() {

		// Private BOs have their resv field pointing to the
		// VM resv and we take the VM resv lock before calling
		// drm_gpuva_sm_map()
		if (vm->resv != gem->resv)
			dma_resv_lock(gem->resv);

		drm_gpuva_[un]link(va);
		gem_[un]pin(gem);

		if (vm->resv != gem->resv)
			dma_resv_unlock(gem->resv);
	}

	dma_resv_unlock(vm->resv);

I'm not sure I get this code right, reading "for_each_sub_op()" and 
"drm_gpuva_sm_map()" looks a bit like things are mixed up?

Or do you mean to represent the sum of all callbacks with 
"for_each_sub_op()"? In this case I assume this code runs in 
drm_sched::run_job() and hence isn't allowed to take the dma-resv lock.

In practice, I don't expect things to deadlock, because the VM resv is
not supposed to be taken outside the VM context and the locking order
is always the same (VM lock first, and then each shared BO
taken/released independently), but I'm not super thrilled by this
nested lock, and I'm wondering if we shouldn't have a pass collecting
locks in a drm_exec context first, and then have
the operations executed. IOW, something like that:

	drm_exec_init(exec, DRM_EXEC_IGNORE_DUPLICATES)
	drm_exec_until_all_locked(exec) {
		// Dummy GEM is the dummy GEM object I use to make the VM
		// participate in the locking without having to teach
		// drm_exec how to deal with raw dma_resv objects.
		ret = drm_exec_lock_obj(exec, vm->dummy_gem);
		drm_exec_retry_on_contention(exec);
		if (ret)
			return ret;

		// Could take the form of drm_gpuva_sm_[un]map_acquire_locks()
		// helpers
		for_each_sub_op() {
			ret = drm_exec_lock_obj(exec, gem);
			if (ret)
				return ret;
		}
	}

	// each iteration of the loop is a call to the driver
	// ->[re,un]map() hook
	for_each_sub_op() {
		...
		gem_[un]pin_locked(gem);
		drm_gpuva_[un]link(va);
		...
	}

	drm_exec_fini(exec);

I have a follow-up patch (still WIP) in the queue to generalize dma-resv 
handling, fence handling and GEM validation within the GPUVA manager as 
optional helper functions: 
https://gitlab.freedesktop.org/nouvelles/kernel/-/commit/a5fc29f3b1edbf3f96fb5a21b858ffe00a3f2584

This was suggested by Matt Brost.

- Danilo

Don't know if I got this right, or if I'm just confused again by how
the drm_gpuva API is supposed to be used.

Regards,

Boris

+ *			ctx->prev_va = NULL;
+ *		}
+ *
+ *		if (op->remap.next) {
+ *			drm_gpuva_link(ctx->next_va);
+ *			ctx->next_va = NULL;
+ *		}
+ *
+ *		return 0;
+ *	}