On Thu, Mar 23, 2023 at 7:03 AM Christian König <christian.koenig@xxxxxxx> wrote: > > Am 23.03.23 um 14:54 schrieb Rob Clark: > > On Thu, Mar 23, 2023 at 12:35 AM Christian König > > <christian.koenig@xxxxxxx> wrote: > >> Am 22.03.23 um 23:44 schrieb Rob Clark: > >>> From: Rob Clark <robdclark@xxxxxxxxxxxx> > >>> > >>> Container fences have burner contexts, which makes the trick to store at > >>> most one fence per context somewhat useless if we don't unwrap array or > >>> chain fences. > >> Mhm, we intentionally kept them not unwrapped since this way they only > >> occupy one fence slot. > >> > >> But it might be better to unwrap them if you add many of those dependencies. > >> > >>> Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> > >>> --- > >>> tbh, I'm not sure why we weren't doing this already, unless there is > >>> something I'm overlooking > >>> > >>> drivers/gpu/drm/scheduler/sched_main.c | 43 +++++++++++++++++--------- > >>> 1 file changed, 28 insertions(+), 15 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > >>> index c2ee44d6224b..f59e5335afbb 100644 > >>> --- a/drivers/gpu/drm/scheduler/sched_main.c > >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c > >>> @@ -41,20 +41,21 @@ > >>> * 4. Entities themselves maintain a queue of jobs that will be scheduled on > >>> * the hardware. > >>> * > >>> * The jobs in a entity are always scheduled in the order that they were pushed. > >>> */ > >>> > >>> #include <linux/kthread.h> > >>> #include <linux/wait.h> > >>> #include <linux/sched.h> > >>> #include <linux/completion.h> > >>> +#include <linux/dma-fence-unwrap.h> > >>> #include <linux/dma-resv.h> > >>> #include <uapi/linux/sched/types.h> > >>> > >>> #include <drm/drm_print.h> > >>> #include <drm/drm_gem.h> > >>> #include <drm/gpu_scheduler.h> > >>> #include <drm/spsc_queue.h> > >>> > >>> #define CREATE_TRACE_POINTS > >>> #include "gpu_scheduler_trace.h" > >>> @@ -665,41 +666,27 @@ void drm_sched_job_arm(struct drm_sched_job *job) > >>> sched = entity->rq->sched; > >>> > >>> job->sched = sched; > >>> job->s_priority = entity->rq - sched->sched_rq; > >>> job->id = atomic64_inc_return(&sched->job_id_count); > >>> > >>> drm_sched_fence_init(job->s_fence, job->entity); > >>> } > >>> EXPORT_SYMBOL(drm_sched_job_arm); > >>> > >>> -/** > >>> - * drm_sched_job_add_dependency - adds the fence as a job dependency > >>> - * @job: scheduler job to add the dependencies to > >>> - * @fence: the dma_fence to add to the list of dependencies. > >>> - * > >>> - * Note that @fence is consumed in both the success and error cases. > >>> - * > >>> - * Returns: > >>> - * 0 on success, or an error on failing to expand the array. > >>> - */ > >>> -int drm_sched_job_add_dependency(struct drm_sched_job *job, > >>> - struct dma_fence *fence) > >>> +static int _add_dependency(struct drm_sched_job *job, struct dma_fence *fence) > >> Please keep the drm_sched_job_ prefix here even for static functions. > >> The symbol _add_dependency just sucks in a backtrace, especially when > >> it's tail optimized. > >> > >>> { > >>> struct dma_fence *entry; > >>> unsigned long index; > >>> u32 id = 0; > >>> int ret; > >>> > >>> - if (!fence) > >>> - return 0; > >>> - > >>> /* Deduplicate if we already depend on a fence from the same context. > >>> * This lets the size of the array of deps scale with the number of > >>> * engines involved, rather than the number of BOs. > >>> */ > >>> xa_for_each(&job->dependencies, index, entry) { > >>> if (entry->context != fence->context) > >>> continue; > >>> > >>> if (dma_fence_is_later(fence, entry)) { > >>> dma_fence_put(entry); > >>> @@ -709,20 +696,46 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job, > >>> } > >>> return 0; > >>> } > >>> > >>> ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL); > >>> if (ret != 0) > >>> dma_fence_put(fence); > >>> > >>> return ret; > >>> } > >>> + > >>> +/** > >>> + * drm_sched_job_add_dependency - adds the fence as a job dependency > >>> + * @job: scheduler job to add the dependencies to > >>> + * @fence: the dma_fence to add to the list of dependencies. > >>> + * > >>> + * Note that @fence is consumed in both the success and error cases. > >>> + * > >>> + * Returns: > >>> + * 0 on success, or an error on failing to expand the array. > >>> + */ > >>> +int drm_sched_job_add_dependency(struct drm_sched_job *job, > >>> + struct dma_fence *fence) > >> Maybe name the new function drm_sched_job_unwrap_add_dependency or > >> something like this. > >> > >> I need to double check, but I think for some cases we don't need or > >> don't even want this in the driver. > > I'd be curious to know the cases where you don't want this.. one thing > > I was thinking about, what if you have a container fence with two > > contained fences. One is on the same ctx as the job, one is not but > > signals sooner. You end up artificially waiting on both, which seems > > sub-optimal. > > Well resv objects don't contain other containers for example. I suppose I have the explicit sync case more in mind, where the dependent fence ends up being a chain or array (if userspace is merging fence fd's). > Then we also have an use case in amdgpu where fence need to be > explicitly waited for even when they are from the same ctx as the job > because otherwise we wouldn't see everything cache coherent. This was the kinda weird case I wanted to make sure I wasn't breaking. I remember seeing something fly by for this, but can't find it now or remember what amdgpu's solution was.. > On the other hand we currently handle that amdgpu use case differently > and the extra overhead of unwrapping fences even if they can't be > containers is probably negligible. > > > Anyways, I can make this a new entrypoint which unwraps, and/or rename > > the internal static function, if we think this is a good idea. > > If you think that's unnecessary keeping your original approach is fine > with me as well. I'm going to assume unnecessary until someone speaks up with their weird specific case ;-) BR, -R > Regards, > Christian. > > > > > BR, > > -R > > > >> Christian. > >> > >>> +{ > >>> + struct dma_fence_unwrap iter; > >>> + struct dma_fence *f; > >>> + int ret = 0; > >>> + > >>> + dma_fence_unwrap_for_each (f, &iter, fence) { > >>> + ret = _add_dependency(job, f); > >>> + if (ret) > >>> + break; > >>> + } > >>> + > >>> + return ret; > >>> +} > >>> EXPORT_SYMBOL(drm_sched_job_add_dependency); > >>> > >>> /** > >>> * drm_sched_job_add_resv_dependencies - add all fences from the resv to the job > >>> * @job: scheduler job to add the dependencies to > >>> * @resv: the dma_resv object to get the fences from > >>> * @usage: the dma_resv_usage to use to filter the fences > >>> * > >>> * This adds all fences matching the given usage from @resv to @job. > >>> * Must be called with the @resv lock held. >