On Thu, Jun 24, 2021 at 7:38 PM Christian König <christian.koenig@xxxxxxx> wrote: > > Am 24.06.21 um 19:29 schrieb Daniel Vetter: > > On Thu, Jun 24, 2021 at 07:03:10PM +0200, Christian König wrote: > >> Am 24.06.21 um 16:00 schrieb Daniel Vetter: > >>> This is essentially part of drm_sched_dependency_optimized(), which > >>> only amdgpu seems to make use of. Use it a bit more. > >>> > >>> Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > >>> Cc: "Christian König" <christian.koenig@xxxxxxx> > >>> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > >>> Cc: Luben Tuikov <luben.tuikov@xxxxxxx> > >>> Cc: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> > >>> Cc: Alex Deucher <alexander.deucher@xxxxxxx> > >>> Cc: Jack Zhang <Jack.Zhang1@xxxxxxx> > >>> --- > >>> drivers/gpu/drm/scheduler/sched_main.c | 7 +++++++ > >>> 1 file changed, 7 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > >>> index 370c336d383f..c31d7cf7df74 100644 > >>> --- a/drivers/gpu/drm/scheduler/sched_main.c > >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c > >>> @@ -649,6 +649,13 @@ int drm_sched_job_await_fence(struct drm_sched_job *job, > >>> if (!fence) > >>> return 0; > >>> + /* if it's a fence from us it's guaranteed to be earlier */ > >>> + if (fence->context == job->entity->fence_context || > >>> + fence->context == job->entity->fence_context + 1) { > >>> + dma_fence_put(fence); > >>> + return 0; > >>> + } > >>> + > >> Well NAK. That would break Vulkan. I'm assuming your reply means the NAK is retracted and was just the usual "this doesn't perfectly fit for amdgpu" reflex? > >> The problem is that Vulkan can insert dependencies between jobs which run on > >> the same queue. > >> > >> So we need to track those as well and if the previous job for the same > >> queue/scheduler is not yet finished a pipeline synchronization needs to be > >> inserted. > >> > >> That's one of the reasons we wasn't able to unify the dependency handling > >> yet. > > That sounds like an extremely amdgpu specific constraint? > > Yeah, that's totally hardware specific. > > It's just that I don't know how else we could track that without having > the same separation as in amdgpu between implicit and explicit fences. > And as far as I understand it that's exactly what you want to avoid. > > As I said this turned out to be really awkward. > > > You're also the > > only one who keeps track of whether the previous job we've scheduled has > > finished already (I guess they can get pipelined and you don't flush by > > default), so you insert fences. > > Yes, exactly that. > > > I guess we can add a await_fence_no_dedup or so for amdgpu, but I'm not > > sure why we have to inflict this design constraint on all other drivers? > > At least I'm not seeing anything in lima, panfrost, v3d or entaviv that > > would break with this, and i915 will also be perfectly fine. > > > > Also note: I'm not using this for amdgpu, exactly because there's a few > > funny things going on. > > Yeah, exactly the reason why we never unified this. Yeah there's clear limits to this, because you also can't use the await_implicit helper, because you have to keep filtering for owner or the current amdgpu uapi goes horribly slow. I think the benefit would be just that we could share the datastructure and the book-keeping, but aside from that you'd need your own integration in amdgpu. One idea I just had was whether we could use the tag bits xarray has for the amdgpu purposed. Like we could do a drm_sched_job_await_fence_tagged, where you supply additional information (like the "this might be relevant for the vm_flush" and things like that). Afaiui xarray tags are very fast to enumerate on if you're looking for specific tags, but I might be wrong. Ideally this would avoid the need for the duplicated amdgpu_job->sched. Cheers, Daniel > Regards, > Christian. > > > Finally: You _really_ need explicit dependency handling for vulkan in your > > uapi, instead of the kernel second-guessing what userspace might be doing. > > That's really not how vulkan is designed to work :-) > > > > > Cheers, Daniel > > > > > >> Christian. > >> > >>> /* Deduplicate if we already depend on a fence from the same context. > >>> * This lets the size of the array of deps scale with the number of > >>> * engines involved, rather than the number of BOs. > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch