On Thu, Jan 19, 2023 at 11:25:51PM +0100, Danilo Krummrich wrote: > On 1/19/23 22:47, Matthew Brost wrote: > > On Thu, Jan 19, 2023 at 06:46:30PM +0100, Danilo Krummrich wrote: > > > > > > > > > On 1/19/23 17:38, Matthew Brost wrote: > > > > On Thu, Jan 19, 2023 at 04:36:43PM +0100, Danilo Krummrich wrote: > > > > > On 1/19/23 05:58, Matthew Brost wrote: > > > > > > On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote: > > > > > > > On 1/18/23 21:37, Thomas Hellström (Intel) wrote: > > > > > > > > > > > > > > > > On 1/18/23 07:12, Danilo Krummrich wrote: > > > > > > > > > This commit provides the implementation for the new uapi motivated by the > > > > > > > > > Vulkan API. It allows user mode drivers (UMDs) to: > > > > > > > > > > > > > > > > > > 1) Initialize a GPU virtual address (VA) space via the new > > > > > > > > > DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA > > > > > > > > > space managed by the kernel and userspace, respectively. > > > > > > > > > > > > > > > > > > 2) Allocate and free a VA space region as well as bind and unbind memory > > > > > > > > > to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl. > > > > > > > > > UMDs can request the named operations to be processed either > > > > > > > > > synchronously or asynchronously. It supports DRM syncobjs > > > > > > > > > (incl. timelines) as synchronization mechanism. The management of the > > > > > > > > > GPU VA mappings is implemented with the DRM GPU VA manager. > > > > > > > > > > > > > > > > > > 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The > > > > > > > > > execution happens asynchronously. It supports DRM syncobj (incl. > > > > > > > > > timelines) as synchronization mechanism. DRM GEM object locking is > > > > > > > > > handled with drm_exec. > > > > > > > > > > > > > > > > > > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM > > > > > > > > > GPU scheduler for the asynchronous paths. > > > > > > > > > > > > > > > > > > Signed-off-by: Danilo Krummrich <dakr@xxxxxxxxxx> > > > > > > > > > --- > > > > > > > > > Documentation/gpu/driver-uapi.rst | 3 + > > > > > > > > > drivers/gpu/drm/nouveau/Kbuild | 2 + > > > > > > > > > drivers/gpu/drm/nouveau/Kconfig | 2 + > > > > > > > > > drivers/gpu/drm/nouveau/nouveau_abi16.c | 16 + > > > > > > > > > drivers/gpu/drm/nouveau/nouveau_abi16.h | 1 + > > > > > > > > > drivers/gpu/drm/nouveau/nouveau_drm.c | 23 +- > > > > > > > > > drivers/gpu/drm/nouveau/nouveau_drv.h | 9 +- > > > > > > > > > drivers/gpu/drm/nouveau/nouveau_exec.c | 310 ++++++++++ > > > > > > > > > drivers/gpu/drm/nouveau/nouveau_exec.h | 55 ++ > > > > > > > > > drivers/gpu/drm/nouveau/nouveau_sched.c | 780 ++++++++++++++++++++++++ > > > > > > > > > drivers/gpu/drm/nouveau/nouveau_sched.h | 98 +++ > > > > > > > > > 11 files changed, 1295 insertions(+), 4 deletions(-) > > > > > > > > > create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c > > > > > > > > > create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h > > > > > > > > > create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c > > > > > > > > > create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h > > > > > > > > ... > > > > > > > > > > > > > > > > > > +static struct dma_fence * > > > > > > > > > +nouveau_bind_job_run(struct nouveau_job *job) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job); > > > > > > > > > + struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli); > > > > > > > > > + struct bind_job_op *op; > > > > > > > > > + int ret = 0; > > > > > > > > > + > > > > > > > > > > > > > > > > I was looking at how nouveau does the async binding compared to how xe > > > > > > > > does it. > > > > > > > > It looks to me that this function being a scheduler run_job callback is > > > > > > > > the main part of the VM_BIND dma-fence signalling critical section for > > > > > > > > the job's done_fence and if so, needs to be annotated as such? > > > > > > > > > > > > > > Yes, that's the case. > > > > > > > > > > > > > > > > > > > > > > > For example nouveau_uvma_region_new allocates memory, which is not > > > > > > > > allowed if in a dma_fence signalling critical section and the locking > > > > > > > > also looks suspicious? > > > > > > > > > > > > > > Thanks for pointing this out, I missed that somehow. > > > > > > > > > > > > > > I will change it to pre-allocate new regions, mappings and page tables > > > > > > > within the job's submit() function. > > > > > > > > > > > > > > > > > > > Yea that what we basically do in Xe, in the IOCTL step allocate all the > > > > > > backing store for new page tables, populate new page tables (these are > > > > > > not yet visible in the page table structure), and in last step which is > > > > > > executed after all the dependencies are satified program all the leaf > > > > > > entires making the new binding visible. > > > > > > > > > > > > We screwed have this up by defering most of the IOCTL to a worker but > > > > > > will fix this fix this one way or another soon - get rid of worker or > > > > > > introduce a type of sync that is signaled after the worker + publish the > > > > > > dma-fence in the worker. I'd like to close on this one soon. > > > > > > > For the ops structures the drm_gpuva_manager allocates for reporting the > > > > > > > split/merge steps back to the driver I have ideas to entirely avoid > > > > > > > allocations, which also is a good thing in respect of Christians feedback > > > > > > > regarding the huge amount of mapping requests some applications seem to > > > > > > > generate. > > > > > > > > > > > > > > > > > > > It should be fine to have allocations to report the split/merge step as > > > > > > this step should be before a dma-fence is published, but yea if possible > > > > > > to avoid extra allocs as that is always better. > > > > > > > > > > I think we can't really ask for the split/merge steps before actually > > > > > running the job, since it requires the particular VA space not to change > > > > > while performing those operations. > > > > > > > > > > E.g. if we'd run the split/merge steps at job submit() time the underlying > > > > > VA space could be changed by other bind jobs executing before this one, > > > > > which would make the calculated split/merge steps obsolete and wrong. > > > > > > > > > > > > > Hmm, maybe I'm not understanding this implementation, admittedly I > > > > haven't studied the gpuva manager code in detail. > > > > > > The limitation I mentioned above doesn't really come from the > > > drm_gpuva_manager, but from how the driver executes the jobs. > > > > > > > > > > > Let me explain what we are doing in Xe. > > > > > > > > Map 0x0000 - 0x3000 -> this resolves into 1 bind operation and 1 VMA > > > > Unmap 0x1000-0x2000 -> this resolves into 1 unbind and 2 rebind operations > > > > > > > > 1. unbind 0x0000-0x3000 -> destroy old VMA > > > > 2. rebind 0x0000-0x1000 -> new VMA > > > > 3. rebind 0x2000-0x3000 -> new VMA > > > > > > > > All of the above steps resolving the operations can be done in the IOCTL > > > > phase and VM's VMA structure is also updated. When the dependencies > > > > are resolved the actual bindings are done on the GPU. We use the BO's > > > > dma-resv slots to ensure there is never a window 0x0000-0x1000 and > > > > 0x2000-0x3000 are never mapped with respect to execs (I forget the exact > > > > details of how we do this but if you want to know I'll explain further). > > > > > > Ok, so you're not only generating the split/merge steps without updating the > > > view of the VA space (which would cause the issue I described) but also > > > already change the view of the VA space in the IOCTL, before the actual page > > > table update happens later on, right? > > > > > > > Yes, we generate the operations + update the view VA space in the IOCTL > > while the actual page table update on the GPU occurs later if there are > > dependencies. > > > > > Currently, in nouveau I do both, the actual page table update and the range > > > allocator update, in run_job(), such that walking the allocator always > > > represents the actual page table layout. > > > > > > > In Xe the VA view is always if all submited bind / unbind ops has > > completed even if some are pending. > > If they're completed, yes. But in the time frame from the VA space update > until the GPU actually updated the page tables it isn't? Not that I think > it's a problem, just curious. > There is a window where the VA space and actual page tables are not in sync. This is definitely is fine as we've tested Xe quite thoroughly. Essentially we are just pipelining the VA space in step with the actual GPU page tables update in the another. The GPU page table update step doesn't look of VA space rather the VA space step provides all the information (statically) so the GPU page table can occur. > > > > Also you may want to take a look at generic page walker Thomas Hellstrom > > wrote for Xe which we use to program the page tables. It is pretty slick > > and probably could use in Nouveau: > > https://patchwork.freedesktop.org/patch/515856/?series=112188&rev=1 > > > > In Xe xe_pt.c is the wrapper for this, it can map, unmap, and invalidate > > VA: > > https://cgit.freedesktop.org/drm/drm-xe/tree/drivers/gpu/drm/xe/xe_pt.c?h=drm-xe-next > > > > > How do you handle map/unmap on BO eviction? > > > > > > > We use the dma-resv slots to order all of this. > > > > Basically exec/map/unmap wait on pending BOs (I believe moves are in the > > KERNEL slot) and BOs moves wait on pending exec/map/unmaps (I believe > > these are BOOKKEEP slot). > > > > I think more details are in the Xe kernel doc, see 'dma-resv usage': > > https://cgit.freedesktop.org/drm/drm-xe/tree/drivers/gpu/drm/xe/xe_vm_doc.h?h=drm-xe-next > > > > This might be a little stale, we dropped the idea of > > DMA_RESV_USAGE_PREEMPT_FENCE and is now just in the BOOKKEEP but I think > > everything else is corret. > > If a BO is currently eviceted, did you consider to just add it to your > rebind list rather than calling ttm_bo_validate() and creating the actual > mapping right away? > No, but that is good idea. We probably should look at this optimization. > I was thinking about this kind of "lazy mapping" approach for cases like > partial unbind requests, where I wouldn't expect the application to need the > mapping right away, but also for normal bind requests. I mean, when there is > already memory pressure, why making the situation worse until an actual EXEC > is requested? > > Also, if I see it correctly unmaps caused by xe_bo_move_notify() happen on > the CPU, right? > I'm not exactly sure what you asking here, but we have 3 cases here. 1. Fault mode, we just blow away the page tables, will get a fault on next use, and then trigger a rebind. 2. Compute mode, trigger the preempt fences (kicks everything using the BO off the hardware before the move), the rebind worker will issue a rebind and resume execution. 3. dma-fence mode, the BO move is scheduled behind any pending execs, the next exec IOCTL will issue the rebind. In all case before the rebind the BO validated again perhap triggering another move, all rebinds are scheduled behind the move. > > > > > > > > > > Can we not use drm_gpuvs_manager in a similar manner to generate the > > > > ops + update the VM's VMA structure early? Again maybe I missing > > > > something here as I haven't fully studied the drm_gpuva_manager. > > > > > > You can use the drm_gpuvs_manager in exactly the way you just described. > > > Though, in your concrete example it would generate just 1 unbind and 1 bind, > > > which it would combine in a re-bind operation. A re-bind operation always > > > has 1 unbind and up to 2 (but a minimum of 1) bind (sub-)operations. > > > > > > > Cool this is what I wanted to hear. Hopefully we can get around to > > building our VM / VMA management on top the drm_gpuvs_manager soon. > > I will re-work the drm_gpuva_manager memory allocation parts first, since > those changes will influence it's API. How the internal allocator works, if > it's drm_mm or something else, shouldn't matter to users of the > drm_gpuva_manager from an API PoV. > Definitely work out the API changes first as we can work on / optimize the internals later. If Christian can provide the benchmarks / WL that do tons of binds we may be able to run these on Xe to help find the best solution for the internals. VK is mostly working for Xe. Matt > - Danilo > > > > > Matt > > > Rebind: > > > 1. unbind 0x0000-0x3000 > > > 2. NULL > > > 3. bind 0x1000-0x3000 > > > > > > It's then up to the driver to remove the old gpuva entry and add a new one. > > > With the given re-bind operation the driver can conclude to just do a > > > partial page table update from 0x0000-0x1000. > > > > > > - Danilo > > > > > > > > > > > Matt > > > > > > > > > Anyway, I should be able to get rid of all the allocations to make this > > > > > safe. > > > > > > > > > > > > > > > > > Also BTW, great work on drm_gpuva_manager too. We will almost likely > > > > > > pick this up in Xe rather than open coding all of this as we currently > > > > > > do. We should probably start the port to this soon so we can contribute > > > > > > to the implementation and get both of our drivers upstream sooner. > > > > > > > > > > Sounds great! > > > > > > > > > > > > Regarding the locking, anything specific that makes it look suspicious to > > > > > > > you? > > > > > > > > > > > > > > > > > > > I haven't looked into this too but almost certainly Thomas is suggesting > > > > > > that if you allocate memory anywhere under the nouveau_uvmm_lock then > > > > > > you can't use this lock in the run_job() callback as this in the > > > > > > dma-fencing path. > > > > > > > > > > Oh, sure. I already checked that, luckily there aren't any further > > > > > allocations under this lock, so this should be safe once I changed to > > > > > run_job() parts to pre-allocation in submit(). > > > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > + nouveau_uvmm_lock(uvmm); > > > > > > > > > + list_for_each_op(op, &bind_job->ops) { > > > > > > > > > + switch (op->op) { > > > > > > > > > + case OP_ALLOC: { > > > > > > > > > + bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE; > > > > > > > > > + > > > > > > > > > + ret = nouveau_uvma_region_new(uvmm, > > > > > > > > > + op->va.addr, > > > > > > > > > + op->va.range, > > > > > > > > > + sparse); > > > > > > > > > + if (ret) > > > > > > > > > + goto out_unlock; > > > > > > > > > + break; > > > > > > > > > + } > > > > > > > > > + case OP_FREE: > > > > > > > > > + ret = nouveau_uvma_region_destroy(uvmm, > > > > > > > > > + op->va.addr, > > > > > > > > > + op->va.range); > > > > > > > > > + if (ret) > > > > > > > > > + goto out_unlock; > > > > > > > > > + break; > > > > > > > > > + case OP_MAP: > > > > > > > > > + ret = nouveau_uvmm_sm_map(uvmm, > > > > > > > > > + op->va.addr, op->va.range, > > > > > > > > > + op->gem.obj, op->gem.offset, > > > > > > > > > + op->flags && 0xff); > > > > > > > > > + if (ret) > > > > > > > > > + goto out_unlock; > > > > > > > > > + break; > > > > > > > > > + case OP_UNMAP: > > > > > > > > > + ret = nouveau_uvmm_sm_unmap(uvmm, > > > > > > > > > + op->va.addr, > > > > > > > > > + op->va.range); > > > > > > > > > + if (ret) > > > > > > > > > + goto out_unlock; > > > > > > > > > + break; > > > > > > > > > + } > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > +out_unlock: > > > > > > > > > + nouveau_uvmm_unlock(uvmm); > > > > > > > > > + if (ret) > > > > > > > > > + NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret); > > > > > > > > > + return ERR_PTR(ret); > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static void > > > > > > > > > +nouveau_bind_job_free(struct nouveau_job *job) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job); > > > > > > > > > + struct bind_job_op *op, *next; > > > > > > > > > + > > > > > > > > > + list_for_each_op_safe(op, next, &bind_job->ops) { > > > > > > > > > + struct drm_gem_object *obj = op->gem.obj; > > > > > > > > > + > > > > > > > > > + if (obj) > > > > > > > > > + drm_gem_object_put(obj); > > > > > > > > > + > > > > > > > > > + list_del(&op->entry); > > > > > > > > > + kfree(op); > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + nouveau_base_job_free(job); > > > > > > > > > + kfree(bind_job); > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static struct nouveau_job_ops nouveau_bind_job_ops = { > > > > > > > > > + .submit = nouveau_bind_job_submit, > > > > > > > > > + .run = nouveau_bind_job_run, > > > > > > > > > + .free = nouveau_bind_job_free, > > > > > > > > > +}; > > > > > > > > > + > > > > > > > > > +static int > > > > > > > > > +bind_job_op_from_uop(struct bind_job_op **pop, > > > > > > > > > + struct drm_nouveau_vm_bind_op *uop) > > > > > > > > > +{ > > > > > > > > > + struct bind_job_op *op; > > > > > > > > > + > > > > > > > > > + op = *pop = kzalloc(sizeof(*op), GFP_KERNEL); > > > > > > > > > + if (!op) > > > > > > > > > + return -ENOMEM; > > > > > > > > > + > > > > > > > > > + op->op = uop->op; > > > > > > > > > + op->flags = uop->flags; > > > > > > > > > + op->va.addr = uop->addr; > > > > > > > > > + op->va.range = uop->range; > > > > > > > > > + > > > > > > > > > + if (op->op == DRM_NOUVEAU_VM_BIND_OP_MAP) { > > > > > > > > > + op->gem.handle = uop->handle; > > > > > > > > > + op->gem.offset = uop->bo_offset; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + return 0; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static void > > > > > > > > > +bind_job_ops_free(struct list_head *ops) > > > > > > > > > +{ > > > > > > > > > + struct bind_job_op *op, *next; > > > > > > > > > + > > > > > > > > > + list_for_each_op_safe(op, next, ops) { > > > > > > > > > + list_del(&op->entry); > > > > > > > > > + kfree(op); > > > > > > > > > + } > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +int > > > > > > > > > +nouveau_bind_job_init(struct nouveau_bind_job **pjob, > > > > > > > > > + struct nouveau_exec_bind *bind) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_bind_job *job; > > > > > > > > > + struct bind_job_op *op; > > > > > > > > > + int i, ret; > > > > > > > > > + > > > > > > > > > + job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL); > > > > > > > > > + if (!job) > > > > > > > > > + return -ENOMEM; > > > > > > > > > + > > > > > > > > > + INIT_LIST_HEAD(&job->ops); > > > > > > > > > + > > > > > > > > > + for (i = 0; i < bind->op.count; i++) { > > > > > > > > > + ret = bind_job_op_from_uop(&op, &bind->op.s[i]); > > > > > > > > > + if (ret) > > > > > > > > > + goto err_free; > > > > > > > > > + > > > > > > > > > + list_add_tail(&op->entry, &job->ops); > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + job->base.sync = !(bind->flags & DRM_NOUVEAU_VM_BIND_RUN_ASYNC); > > > > > > > > > + job->base.ops = &nouveau_bind_job_ops; > > > > > > > > > + > > > > > > > > > + ret = nouveau_base_job_init(&job->base, &bind->base); > > > > > > > > > + if (ret) > > > > > > > > > + goto err_free; > > > > > > > > > + > > > > > > > > > + return 0; > > > > > > > > > + > > > > > > > > > +err_free: > > > > > > > > > + bind_job_ops_free(&job->ops); > > > > > > > > > + kfree(job); > > > > > > > > > + *pjob = NULL; > > > > > > > > > + > > > > > > > > > + return ret; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static int > > > > > > > > > +sync_find_fence(struct nouveau_job *job, > > > > > > > > > + struct drm_nouveau_sync *sync, > > > > > > > > > + struct dma_fence **fence) > > > > > > > > > +{ > > > > > > > > > + u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK; > > > > > > > > > + u64 point = 0; > > > > > > > > > + int ret; > > > > > > > > > + > > > > > > > > > + if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ && > > > > > > > > > + stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) > > > > > > > > > + return -EOPNOTSUPP; > > > > > > > > > + > > > > > > > > > + if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) > > > > > > > > > + point = sync->timeline_value; > > > > > > > > > + > > > > > > > > > + ret = drm_syncobj_find_fence(job->file_priv, > > > > > > > > > + sync->handle, point, > > > > > > > > > + sync->flags, fence); > > > > > > > > > + if (ret) > > > > > > > > > + return ret; > > > > > > > > > + > > > > > > > > > + return 0; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static int > > > > > > > > > +exec_job_binds_wait(struct nouveau_job *job) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job); > > > > > > > > > + struct nouveau_cli *cli = exec_job->base.cli; > > > > > > > > > + struct nouveau_sched_entity *bind_entity = &cli->sched_entity; > > > > > > > > > + signed long ret; > > > > > > > > > + int i; > > > > > > > > > + > > > > > > > > > + for (i = 0; i < job->in_sync.count; i++) { > > > > > > > > > + struct nouveau_job *it; > > > > > > > > > + struct drm_nouveau_sync *sync = &job->in_sync.s[i]; > > > > > > > > > + struct dma_fence *fence; > > > > > > > > > + bool found; > > > > > > > > > + > > > > > > > > > + ret = sync_find_fence(job, sync, &fence); > > > > > > > > > + if (ret) > > > > > > > > > + return ret; > > > > > > > > > + > > > > > > > > > + mutex_lock(&bind_entity->job.mutex); > > > > > > > > > + found = false; > > > > > > > > > + list_for_each_entry(it, &bind_entity->job.list, head) { > > > > > > > > > + if (fence == it->done_fence) { > > > > > > > > > + found = true; > > > > > > > > > + break; > > > > > > > > > + } > > > > > > > > > + } > > > > > > > > > + mutex_unlock(&bind_entity->job.mutex); > > > > > > > > > + > > > > > > > > > + /* If the fence is not from a VM_BIND job, don't wait for it. */ > > > > > > > > > + if (!found) > > > > > > > > > + continue; > > > > > > > > > + > > > > > > > > > + ret = dma_fence_wait_timeout(fence, true, > > > > > > > > > + msecs_to_jiffies(500)); > > > > > > > > > + if (ret < 0) > > > > > > > > > + return ret; > > > > > > > > > + else if (ret == 0) > > > > > > > > > + return -ETIMEDOUT; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + return 0; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +int > > > > > > > > > +nouveau_exec_job_submit(struct nouveau_job *job) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job); > > > > > > > > > + struct nouveau_cli *cli = exec_job->base.cli; > > > > > > > > > + struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli); > > > > > > > > > + struct drm_exec *exec = &job->exec; > > > > > > > > > + struct drm_gem_object *obj; > > > > > > > > > + unsigned long index; > > > > > > > > > + int ret; > > > > > > > > > + > > > > > > > > > + ret = exec_job_binds_wait(job); > > > > > > > > > + if (ret) > > > > > > > > > + return ret; > > > > > > > > > + > > > > > > > > > + nouveau_uvmm_lock(uvmm); > > > > > > > > > + drm_exec_while_not_all_locked(exec) { > > > > > > > > > + struct drm_gpuva *va; > > > > > > > > > + > > > > > > > > > + drm_gpuva_for_each_va(va, &uvmm->umgr) { > > > > > > > > > + ret = drm_exec_prepare_obj(exec, va->gem.obj, 1); > > > > > > > > > + drm_exec_break_on_contention(exec); > > > > > > > > > + if (ret) > > > > > > > > > + return ret; > > > > > > > > > + } > > > > > > > > > + } > > > > > > > > > + nouveau_uvmm_unlock(uvmm); > > > > > > > > > + > > > > > > > > > + drm_exec_for_each_locked_object(exec, index, obj) { > > > > > > > > > + struct dma_resv *resv = obj->resv; > > > > > > > > > + struct nouveau_bo *nvbo = nouveau_gem_object(obj); > > > > > > > > > + > > > > > > > > > + ret = nouveau_bo_validate(nvbo, true, false); > > > > > > > > > + if (ret) > > > > > > > > > + return ret; > > > > > > > > > + > > > > > > > > > + dma_resv_add_fence(resv, job->done_fence, DMA_RESV_USAGE_WRITE); > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + return 0; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static struct dma_fence * > > > > > > > > > +nouveau_exec_job_run(struct nouveau_job *job) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job); > > > > > > > > > + struct nouveau_fence *fence; > > > > > > > > > + int i, ret; > > > > > > > > > + > > > > > > > > > + ret = nouveau_dma_wait(job->chan, exec_job->push.count + 1, 16); > > > > > > > > > + if (ret) { > > > > > > > > > + NV_PRINTK(err, job->cli, "nv50cal_space: %d\n", ret); > > > > > > > > > + return ERR_PTR(ret); > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + for (i = 0; i < exec_job->push.count; i++) { > > > > > > > > > + nv50_dma_push(job->chan, exec_job->push.s[i].va, > > > > > > > > > + exec_job->push.s[i].va_len); > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + ret = nouveau_fence_new(job->chan, false, &fence); > > > > > > > > > + if (ret) { > > > > > > > > > + NV_PRINTK(err, job->cli, "error fencing pushbuf: %d\n", ret); > > > > > > > > > + WIND_RING(job->chan); > > > > > > > > > + return ERR_PTR(ret); > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + return &fence->base; > > > > > > > > > +} > > > > > > > > > +static void > > > > > > > > > +nouveau_exec_job_free(struct nouveau_job *job) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job); > > > > > > > > > + > > > > > > > > > + nouveau_base_job_free(job); > > > > > > > > > + > > > > > > > > > + kfree(exec_job->push.s); > > > > > > > > > + kfree(exec_job); > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static struct nouveau_job_ops nouveau_exec_job_ops = { > > > > > > > > > + .submit = nouveau_exec_job_submit, > > > > > > > > > + .run = nouveau_exec_job_run, > > > > > > > > > + .free = nouveau_exec_job_free, > > > > > > > > > +}; > > > > > > > > > + > > > > > > > > > +int > > > > > > > > > +nouveau_exec_job_init(struct nouveau_exec_job **pjob, > > > > > > > > > + struct nouveau_exec *exec) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_exec_job *job; > > > > > > > > > + int ret; > > > > > > > > > + > > > > > > > > > + job = *pjob = kzalloc(sizeof(*job), GFP_KERNEL); > > > > > > > > > + if (!job) > > > > > > > > > + return -ENOMEM; > > > > > > > > > + > > > > > > > > > + job->push.count = exec->push.count; > > > > > > > > > + job->push.s = kmemdup(exec->push.s, > > > > > > > > > + sizeof(*exec->push.s) * > > > > > > > > > + exec->push.count, > > > > > > > > > + GFP_KERNEL); > > > > > > > > > + if (!job->push.s) { > > > > > > > > > + ret = -ENOMEM; > > > > > > > > > + goto err_free_job; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + job->base.ops = &nouveau_exec_job_ops; > > > > > > > > > + ret = nouveau_base_job_init(&job->base, &exec->base); > > > > > > > > > + if (ret) > > > > > > > > > + goto err_free_pushs; > > > > > > > > > + > > > > > > > > > + return 0; > > > > > > > > > + > > > > > > > > > +err_free_pushs: > > > > > > > > > + kfree(job->push.s); > > > > > > > > > +err_free_job: > > > > > > > > > + kfree(job); > > > > > > > > > + *pjob = NULL; > > > > > > > > > + > > > > > > > > > + return ret; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +void nouveau_job_fini(struct nouveau_job *job) > > > > > > > > > +{ > > > > > > > > > + dma_fence_put(job->done_fence); > > > > > > > > > + drm_sched_job_cleanup(&job->base); > > > > > > > > > + job->ops->free(job); > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static int > > > > > > > > > +nouveau_job_add_deps(struct nouveau_job *job) > > > > > > > > > +{ > > > > > > > > > + struct dma_fence *in_fence = NULL; > > > > > > > > > + int ret, i; > > > > > > > > > + > > > > > > > > > + for (i = 0; i < job->in_sync.count; i++) { > > > > > > > > > + struct drm_nouveau_sync *sync = &job->in_sync.s[i]; > > > > > > > > > + > > > > > > > > > + ret = sync_find_fence(job, sync, &in_fence); > > > > > > > > > + if (ret) { > > > > > > > > > + NV_PRINTK(warn, job->cli, > > > > > > > > > + "Failed to find syncobj (-> in): handle=%d\n", > > > > > > > > > + sync->handle); > > > > > > > > > + return ret; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + ret = drm_sched_job_add_dependency(&job->base, in_fence); > > > > > > > > > + if (ret) > > > > > > > > > + return ret; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + return 0; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static int > > > > > > > > > +nouveau_job_fence_attach(struct nouveau_job *job, struct dma_fence > > > > > > > > > *fence) > > > > > > > > > +{ > > > > > > > > > + struct drm_syncobj *out_sync; > > > > > > > > > + int i; > > > > > > > > > + > > > > > > > > > + for (i = 0; i < job->out_sync.count; i++) { > > > > > > > > > + struct drm_nouveau_sync *sync = &job->out_sync.s[i]; > > > > > > > > > + u32 stype = sync->flags & DRM_NOUVEAU_SYNC_TYPE_MASK; > > > > > > > > > + > > > > > > > > > + if (stype != DRM_NOUVEAU_SYNC_SYNCOBJ && > > > > > > > > > + stype != DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) > > > > > > > > > + return -EOPNOTSUPP; > > > > > > > > > + > > > > > > > > > + out_sync = drm_syncobj_find(job->file_priv, sync->handle); > > > > > > > > > + if (!out_sync) { > > > > > > > > > + NV_PRINTK(warn, job->cli, > > > > > > > > > + "Failed to find syncobj (-> out): handle=%d\n", > > > > > > > > > + sync->handle); > > > > > > > > > + return -ENOENT; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + if (stype == DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ) { > > > > > > > > > + struct dma_fence_chain *chain; > > > > > > > > > + > > > > > > > > > + chain = dma_fence_chain_alloc(); > > > > > > > > > + if (!chain) { > > > > > > > > > + drm_syncobj_put(out_sync); > > > > > > > > > + return -ENOMEM; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + drm_syncobj_add_point(out_sync, chain, fence, > > > > > > > > > + sync->timeline_value); > > > > > > > > > + } else { > > > > > > > > > + drm_syncobj_replace_fence(out_sync, fence); > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + drm_syncobj_put(out_sync); > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + return 0; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static struct dma_fence * > > > > > > > > > +nouveau_job_run(struct nouveau_job *job) > > > > > > > > > +{ > > > > > > > > > + return job->ops->run(job); > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static int > > > > > > > > > +nouveau_job_run_sync(struct nouveau_job *job) > > > > > > > > > +{ > > > > > > > > > + struct dma_fence *fence; > > > > > > > > > + int ret; > > > > > > > > > + > > > > > > > > > + fence = nouveau_job_run(job); > > > > > > > > > + if (IS_ERR(fence)) { > > > > > > > > > + return PTR_ERR(fence); > > > > > > > > > + } else if (fence) { > > > > > > > > > + ret = dma_fence_wait(fence, true); > > > > > > > > > + if (ret) > > > > > > > > > + return ret; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + dma_fence_signal(job->done_fence); > > > > > > > > > + > > > > > > > > > + return 0; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +int > > > > > > > > > +nouveau_job_submit(struct nouveau_job *job) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_sched_entity *entity = > > > > > > > > > to_nouveau_sched_entity(job->base.entity); > > > > > > > > > + int ret; > > > > > > > > > + > > > > > > > > > + drm_exec_init(&job->exec, true); > > > > > > > > > + > > > > > > > > > + ret = nouveau_job_add_deps(job); > > > > > > > > > + if (ret) > > > > > > > > > + goto out; > > > > > > > > > + > > > > > > > > > + drm_sched_job_arm(&job->base); > > > > > > > > > + job->done_fence = dma_fence_get(&job->base.s_fence->finished); > > > > > > > > > + > > > > > > > > > + ret = nouveau_job_fence_attach(job, job->done_fence); > > > > > > > > > + if (ret) > > > > > > > > > + goto out; > > > > > > > > > + > > > > > > > > > + if (job->ops->submit) { > > > > > > > > > + ret = job->ops->submit(job); > > > > > > > > > + if (ret) > > > > > > > > > + goto out; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + if (job->sync) { > > > > > > > > > + drm_exec_fini(&job->exec); > > > > > > > > > + > > > > > > > > > + /* We're requested to run a synchronous job, hence don't push > > > > > > > > > + * the job, bypassing the job scheduler, and execute the jobs > > > > > > > > > + * run() function right away. > > > > > > > > > + * > > > > > > > > > + * As a consequence of bypassing the job scheduler we need to > > > > > > > > > + * handle fencing and job cleanup ourselfes. > > > > > > > > > + */ > > > > > > > > > + ret = nouveau_job_run_sync(job); > > > > > > > > > + > > > > > > > > > + /* If the job fails, the caller will do the cleanup for us. */ > > > > > > > > > + if (!ret) > > > > > > > > > + nouveau_job_fini(job); > > > > > > > > > + > > > > > > > > > + return ret; > > > > > > > > > + } else { > > > > > > > > > + mutex_lock(&entity->job.mutex); > > > > > > > > > + drm_sched_entity_push_job(&job->base); > > > > > > > > > + list_add_tail(&job->head, &entity->job.list); > > > > > > > > > + mutex_unlock(&entity->job.mutex); > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > +out: > > > > > > > > > + drm_exec_fini(&job->exec); > > > > > > > > > + return ret; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static struct dma_fence * > > > > > > > > > +nouveau_sched_run_job(struct drm_sched_job *sched_job) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_job *job = to_nouveau_job(sched_job); > > > > > > > > > + > > > > > > > > > + return nouveau_job_run(job); > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static enum drm_gpu_sched_stat > > > > > > > > > +nouveau_sched_timedout_job(struct drm_sched_job *sched_job) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_job *job = to_nouveau_job(sched_job); > > > > > > > > > + struct nouveau_channel *chan = job->chan; > > > > > > > > > + > > > > > > > > > + if (unlikely(!atomic_read(&chan->killed))) > > > > > > > > > + nouveau_channel_kill(chan); > > > > > > > > > + > > > > > > > > > + NV_PRINTK(warn, job->cli, "job timeout, channel %d killed!\n", > > > > > > > > > + chan->chid); > > > > > > > > > + > > > > > > > > > + nouveau_sched_entity_fini(job->entity); > > > > > > > > > + > > > > > > > > > + return DRM_GPU_SCHED_STAT_ENODEV; > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static void > > > > > > > > > +nouveau_sched_free_job(struct drm_sched_job *sched_job) > > > > > > > > > +{ > > > > > > > > > + struct nouveau_job *job = to_nouveau_job(sched_job); > > > > > > > > > + struct nouveau_sched_entity *entity = job->entity; > > > > > > > > > + > > > > > > > > > + mutex_lock(&entity->job.mutex); > > > > > > > > > + list_del(&job->head); > > > > > > > > > + mutex_unlock(&entity->job.mutex); > > > > > > > > > + > > > > > > > > > + nouveau_job_fini(job); > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +int nouveau_sched_entity_init(struct nouveau_sched_entity *entity, > > > > > > > > > + struct drm_gpu_scheduler *sched) > > > > > > > > > +{ > > > > > > > > > + > > > > > > > > > + INIT_LIST_HEAD(&entity->job.list); > > > > > > > > > + mutex_init(&entity->job.mutex); > > > > > > > > > + > > > > > > > > > + return drm_sched_entity_init(&entity->base, > > > > > > > > > + DRM_SCHED_PRIORITY_NORMAL, > > > > > > > > > + &sched, 1, NULL); > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +void > > > > > > > > > +nouveau_sched_entity_fini(struct nouveau_sched_entity *entity) > > > > > > > > > +{ > > > > > > > > > + drm_sched_entity_destroy(&entity->base); > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +static const struct drm_sched_backend_ops nouveau_sched_ops = { > > > > > > > > > + .run_job = nouveau_sched_run_job, > > > > > > > > > + .timedout_job = nouveau_sched_timedout_job, > > > > > > > > > + .free_job = nouveau_sched_free_job, > > > > > > > > > +}; > > > > > > > > > + > > > > > > > > > +int nouveau_sched_init(struct drm_gpu_scheduler *sched, > > > > > > > > > + struct nouveau_drm *drm) > > > > > > > > > +{ > > > > > > > > > + long job_hang_limit = > > > > > > > > > msecs_to_jiffies(NOUVEAU_SCHED_JOB_TIMEOUT_MS); > > > > > > > > > + > > > > > > > > > + return drm_sched_init(sched, &nouveau_sched_ops, > > > > > > > > > + NOUVEAU_SCHED_HW_SUBMISSIONS, 0, job_hang_limit, > > > > > > > > > + NULL, NULL, "nouveau", drm->dev->dev); > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > +void nouveau_sched_fini(struct drm_gpu_scheduler *sched) > > > > > > > > > +{ > > > > > > > > > + drm_sched_fini(sched); > > > > > > > > > +} > > > > > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h > > > > > > > > > b/drivers/gpu/drm/nouveau/nouveau_sched.h > > > > > > > > > new file mode 100644 > > > > > > > > > index 000000000000..7fc5b7eea810 > > > > > > > > > --- /dev/null > > > > > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_sched.h > > > > > > > > > @@ -0,0 +1,98 @@ > > > > > > > > > +// SPDX-License-Identifier: MIT > > > > > > > > > + > > > > > > > > > +#ifndef NOUVEAU_SCHED_H > > > > > > > > > +#define NOUVEAU_SCHED_H > > > > > > > > > + > > > > > > > > > +#include <linux/types.h> > > > > > > > > > + > > > > > > > > > +#include <drm/drm_exec.h> > > > > > > > > > +#include <drm/gpu_scheduler.h> > > > > > > > > > + > > > > > > > > > +#include "nouveau_drv.h" > > > > > > > > > +#include "nouveau_exec.h" > > > > > > > > > + > > > > > > > > > +#define to_nouveau_job(sched_job) \ > > > > > > > > > + container_of((sched_job), struct nouveau_job, base) > > > > > > > > > + > > > > > > > > > +#define to_nouveau_exec_job(job) \ > > > > > > > > > + container_of((job), struct nouveau_exec_job, base) > > > > > > > > > + > > > > > > > > > +#define to_nouveau_bind_job(job) \ > > > > > > > > > + container_of((job), struct nouveau_bind_job, base) > > > > > > > > > + > > > > > > > > > +struct nouveau_job { > > > > > > > > > + struct drm_sched_job base; > > > > > > > > > + struct list_head head; > > > > > > > > > + > > > > > > > > > + struct nouveau_sched_entity *entity; > > > > > > > > > + > > > > > > > > > + struct drm_file *file_priv; > > > > > > > > > + struct nouveau_cli *cli; > > > > > > > > > + struct nouveau_channel *chan; > > > > > > > > > + > > > > > > > > > + struct drm_exec exec; > > > > > > > > > + struct dma_fence *done_fence; > > > > > > > > > + > > > > > > > > > + bool sync; > > > > > > > > > + > > > > > > > > > + struct { > > > > > > > > > + struct drm_nouveau_sync *s; > > > > > > > > > + u32 count; > > > > > > > > > + } in_sync; > > > > > > > > > + > > > > > > > > > + struct { > > > > > > > > > + struct drm_nouveau_sync *s; > > > > > > > > > + u32 count; > > > > > > > > > + } out_sync; > > > > > > > > > + > > > > > > > > > + struct nouveau_job_ops { > > > > > > > > > + int (*submit)(struct nouveau_job *); > > > > > > > > > + struct dma_fence *(*run)(struct nouveau_job *); > > > > > > > > > + void (*free)(struct nouveau_job *); > > > > > > > > > + } *ops; > > > > > > > > > +}; > > > > > > > > > + > > > > > > > > > +struct nouveau_exec_job { > > > > > > > > > + struct nouveau_job base; > > > > > > > > > + > > > > > > > > > + struct { > > > > > > > > > + struct drm_nouveau_exec_push *s; > > > > > > > > > + u32 count; > > > > > > > > > + } push; > > > > > > > > > +}; > > > > > > > > > + > > > > > > > > > +struct nouveau_bind_job { > > > > > > > > > + struct nouveau_job base; > > > > > > > > > + > > > > > > > > > + /* struct bind_job_op */ > > > > > > > > > + struct list_head ops; > > > > > > > > > +}; > > > > > > > > > + > > > > > > > > > +int nouveau_bind_job_init(struct nouveau_bind_job **job, > > > > > > > > > + struct nouveau_exec_bind *bind); > > > > > > > > > +int nouveau_exec_job_init(struct nouveau_exec_job **job, > > > > > > > > > + struct nouveau_exec *exec); > > > > > > > > > + > > > > > > > > > +int nouveau_job_submit(struct nouveau_job *job); > > > > > > > > > +void nouveau_job_fini(struct nouveau_job *job); > > > > > > > > > + > > > > > > > > > +#define to_nouveau_sched_entity(entity) \ > > > > > > > > > + container_of((entity), struct nouveau_sched_entity, base) > > > > > > > > > + > > > > > > > > > +struct nouveau_sched_entity { > > > > > > > > > + struct drm_sched_entity base; > > > > > > > > > + struct { > > > > > > > > > + struct list_head list; > > > > > > > > > + struct mutex mutex; > > > > > > > > > + } job; > > > > > > > > > +}; > > > > > > > > > + > > > > > > > > > +int nouveau_sched_entity_init(struct nouveau_sched_entity *entity, > > > > > > > > > + struct drm_gpu_scheduler *sched); > > > > > > > > > +void nouveau_sched_entity_fini(struct nouveau_sched_entity *entity); > > > > > > > > > + > > > > > > > > > +int nouveau_sched_init(struct drm_gpu_scheduler *sched, > > > > > > > > > + struct nouveau_drm *drm); > > > > > > > > > +void nouveau_sched_fini(struct drm_gpu_scheduler *sched); > > > > > > > > > + > > > > > > > > > +#endif > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >