On 11/07/2016 08:55 AM, Christian König wrote: > Am 07.11.2016 um 04:29 schrieb Michel Dänzer: >> On 07/11/16 11:47 AM, Mario Kleiner wrote: >>> External clients which import our bo's wait only >>> for exclusive dmabuf-fences, not on shared ones, >>> so attach fences on exported buffers as exclusive >>> ones, if the buffers get imported by an external >>> client. >>> >>> See discussion in thread: >>> https://lists.freedesktop.org/archives/dri-devel/2016-October/122370.html >>> >>> >>> Tested on Intel iGPU + AMD Tonga dGPU as DRI3/Present >>> Prime render offload, and with the Tonga standalone as >>> primary gpu. >>> >>> v2: Add a wait for all shared fences before prime export, >>> as suggested by Christian Koenig. >>> >>> v3: - Mark buffer prime_exported in amdgpu_gem_prime_pin, >>> so we only use the exclusive fence when exporting a >>> bo to external clients like a separate iGPU, but not >>> when exporting/importing from/to ourselves as part of >>> regular DRI3 fd passing. >>> >>> - Propagate failure of reservation_object_wait_rcu back >>> to caller. >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95472 >>> (v1) Tested-by: Mike Lothian <mike at fireburn.co.uk> >> FWIW, I think the (v1) is usually added at the end of the line, not at >> the beginning. Ok. >> >> [...] >> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c >>> index 7700dc2..c4494d0 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c >>> @@ -81,6 +81,20 @@ int amdgpu_gem_prime_pin(struct drm_gem_object *obj) >>> { >>> struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); >>> int ret = 0; >>> + long lret; >>> + >>> + /* >>> + * Wait for all shared fences to complete before we switch to >>> future >>> + * use of exclusive fence on this prime_exported bo. >>> + */ >>> + lret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true, >>> false, >>> + MAX_SCHEDULE_TIMEOUT); >>> + if (unlikely(lret < 0)) { >>> + DRM_DEBUG_PRIME("Fence wait failed: %li\n", lret); >>> + return lret; >>> + } >>> + >>> + bo->prime_exported = true; >> We should probably clear bo->prime_exported in amdgpu_gem_prime_unpin. >> >> Also, I think we should set bo->prime_exported (prime_shared?) in >> amdgpu_gem_prime_import_sg_table as well, so that we'll also set >> exclusive fences on BOs imported from other GPUs. > Yes, really good idea. > Ok. I guess we don't need to wait for fences there before setting the flag/counter, as we can't have done anything yet with the bo just created from the imported sg_table? > Additional to that are we sure that amdgpu_gem_prime_pin() is only > called once for each GEM object? Could in theory be that somebody > exports a BO to another GFX device as well as a V4L device for example. > > In this case we would need a counter, but I need to double check that as > well. > If we want to clear the prime_exported flag in amdgpu_gem_prime_unpin() as an optimization if all clients detach from the buffer, then we need a prime_shared_counter instead of a prime_exported flag afaics. The dmabuf api allows multiple clients to import and attach to an exported dmabuf, each one calling dma_buf_attach which will call amdgpu_gem_prime_pin. > In general I would protected the flag/counter by the BO being reserved > to avoid any races. In other word move those lines a bit down after the > amdgpu_bo_reserve(). Ok. -mario > > Regards, > Christian.