> Please just completely drop the begin_cpu_access() and end_cpu_access() > callbacks. > Checking other drivers the only one actually implementing something special > is the i915 driver which needs to remove the BO from the GTT domain for > cache coherency I think. Well, as the first step, we can drop the begin/end() callbacks for now, and revisit later. Sam > -----Original Message----- > From: Koenig, Christian > Sent: Thursday, September 21, 2017 3:39 AM > To: Li, Samuel <Samuel.Li at amd.com>; amd-gfx at lists.freedesktop.org > Subject: Re: [PATCH v2 1/1] drm/amdgpu: Add gem_prime_mmap support > > Am 20.09.2017 um 22:31 schrieb Li, Samuel: > >> The waiting done here is only for the shared fence to switch from > >> explicitly to implicitly synchronization for correct interaction with the Intel > driver. > > Actually here it is going to wait for all fences, > > 94 ret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true... > > That is actually a bug, we only need to wait here when prime_shared_count > == 0. > > >> ttm_bo_validate() is an asynchronous operation to enable GPU access > >> to the BO, it isn't related at all to possible CPU access and can > >> actually prevent it a number of cases. > >>> amdgpu_ttm_bind() in turn binds the BO to the GART space and isn't > >>> related > >> to CPU access either. > > ttm_bo_validate() can move buffer is necessary, > > No, that is a very common misconception. > > ttm_bo_validate() schedules the buffer to be moved at some point in the > future. > > The only thing which saves us here is the fact that TTM waits for the move to > be completed in it's page fault handler. > > > and amdgpu_ttm_bind() will flush hdp, which we have discussed before. > > And as concluded before that is unnecessary. > > > > >> What you most likely need to do here is to use > >> reservation_object_wait_timeout_rcu() to wait for all GPU operations to > end. > > Right, that is called by amdgpu_gem_prime_pin(). > > Please just completely drop the begin_cpu_access() and end_cpu_access() > callbacks. > > Checking other drivers the only one actually implementing something special > is the i915 driver which needs to remove the BO from the GTT domain for > cache coherency I think. > > The omap driver implements those callbacks to grab an extra reference to > the pages while they are access, we don't need to do this since TTM grabs an > extra reference during the mmap anyway. > > All other drivers implementing mmap (exynos, rockchip, tilcdc, arcpgu, msm, > meson, etnaviv, sti, qxl, vc4, rcar-du, atmel-hlcdc, imx, mediatek, mali, vgem, > fsl-dcu, zte, hisilicon, sun4i, mxsfb) don't have a begin/end callback. > > Pinning the BO can actually cause problem with the display code when the > BO needs to be scanned out, so we should avoid that as much as possible. > > Regards, > Christian. > > > > > Sam > > > >> -----Original Message----- > >> From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com] > >> Sent: Wednesday, September 20, 2017 2:15 PM > >> To: Li, Samuel <Samuel.Li at amd.com>; Koenig, Christian > >> <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org > >> Subject: Re: [PATCH v2 1/1] drm/amdgpu: Add gem_prime_mmap > support > >> > >> Am 20.09.2017 um 19:47 schrieb Li, Samuel: > >>>> No that isn't correct. Pinning just notes that the BO shouldn't be > >>>> moved any more. It doesn't wait for anything. > >>> It does. The following is from amdgpu_gem_prime_pin(), > >>> 91 * Wait for all shared fences to complete before we switch to > future > >>> 92 * use of exclusive fence on this prime shared bo. > >>> 93 */ > >>> 94 ret = reservation_object_wait_timeout_rcu(bo->tbo.resv, true, > >> false, > >>> 95 MAX_SCHEDULE_TIMEOUT); > >>> 96 if (unlikely(ret < 0)) { > >>> 97 DRM_DEBUG_PRIME("Fence wait failed: %li\n", ret); > >>> 98 amdgpu_bo_unreserve(bo); > >>> 99 return ret; > >>> 100 } > >>> > >>> Besides, pinning process prepares all the stuff before and after > >>> moving buffer(ttm_bo_validate, amdgpu_ttm_bind), > >> No, you misunderstood what this is good for. > >> > >> The waiting done here is only for the shared fence to switch from > >> explicitly to implicitly synchronization for correct interaction with the Intel > driver. > >> > >> As soon the the BO is exported that shouldn't wait for anything any more. > >> > >>> I think if a buffer can be moved, it is probably also in a good > >>> condition to be > >> accessed. > >> > >> That is complete nonsense. > >> > >> ttm_bo_validate() is an asynchronous operation to enable GPU access > >> to the BO, it isn't related at all to possible CPU access and can > >> actually prevent it a number of cases. > >> > >> amdgpu_ttm_bind() in turn binds the BO to the GART space and isn't > >> related to CPU access either. > >> > >> What you most likely need to do here is to use > >> reservation_object_wait_timeout_rcu() to wait for all GPU operations to > end. > >> > >> Regards, > >> Christian. > >> > >>> Sam > >>> > >>>> -----Original Message----- > >>>> From: Koenig, Christian > >>>> Sent: Wednesday, September 20, 2017 1:38 PM > >>>> To: Li, Samuel <Samuel.Li at amd.com>; amd-gfx at lists.freedesktop.org > >>>> Subject: Re: [PATCH v2 1/1] drm/amdgpu: Add gem_prime_mmap > >> support > >>>> Am 20.09.2017 um 19:34 schrieb Li, Samuel: > >>>>>> If that is what this callback should do then this implementation > >>>>>> would be incorrect. Pinning doesn't wait for any GPU operation to > >> finish. > >>>>> During pining, it will all the fences to finish. That shall be OK. > >>>> No that isn't correct. Pinning just notes that the BO shouldn't be > >>>> moved any more. > >>>> > >>>> It doesn't wait for anything. > >>>> > >>>> Christian. > >>>> > >>>>> Sam > >>>>> > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Koenig, Christian > >>>>>> Sent: Wednesday, September 20, 2017 12:21 PM > >>>>>> To: Li, Samuel <Samuel.Li at amd.com>; amd- > gfx at lists.freedesktop.org > >>>>>> Subject: Re: [PATCH v2 1/1] drm/amdgpu: Add gem_prime_mmap > >>>> support > >>>>>> Am 20.09.2017 um 17:44 schrieb Li, Samuel: > >>>>>>>> What happens when another thread is using > amdgpu_dmabuf_ops > >> to > >>>> call > >>>>>>>> begin_cpu_access/end_cpu_access when you are fixing it up > again? > >>>>>>> Right, that is an issue. > >>>>>> A simple "if (!amdgpu_dmabuf_ops.begin_cpu_access)" should be > >> able > >>>> to > >>>>>> deal with that. > >>>>>> > >>>>>>>> I would just completely drop the two callbacks, pinning is not > >>>>>>>> necessary for CPU access and thinking more about it it actually > >>>>>>>> has some unwanted side effects. > >>>>>>> CPU access needs synchronization anyway, so the two callbacks > >>>>>>> cannot be > >>>>>> dropped (other drivers implemented too), so I would like to keep > >>>>>> it there for now. > >>>>>> > >>>>>> Wait a second what do you mean with "CPU access needs > >>>> synchronization"? > >>>>>> At least i915 makes the memory GPU inaccessible when you start to > >>>>>> use it with the CPU. > >>>>>> > >>>>>> If that is what this callback should do then this implementation > >>>>>> would be incorrect. Pinning doesn't wait for any GPU operation to > >> finish. > >>>>>> Regards, > >>>>>> Christian. > >>>>>> > >>>>>>> Sam > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Koenig, Christian > >>>>>>>> Sent: Wednesday, September 20, 2017 2:58 AM > >>>>>>>> To: Li, Samuel <Samuel.Li at amd.com>; amd- > >> gfx at lists.freedesktop.org > >>>>>>>> Subject: Re: [PATCH v2 1/1] drm/amdgpu: Add > gem_prime_mmap > >>>>>> support > >>>>>>>>> What do you mean "This isn't race free"? > >>>>>>>> Take a look at the code again: > >>>>>>>>> + dma_buf = drm_gem_prime_export(dev, gobj, flags); > >>>>>>>>> + amdgpu_dmabuf_ops = *(dma_buf->ops); > >>>>>>>>> + amdgpu_dmabuf_ops.begin_cpu_access = > >>>>>>>> amdgpu_gem_begin_cpu_access; > >>>>>>>>> + amdgpu_dmabuf_ops.end_cpu_access = > >>>>>>>> amdgpu_gem_end_cpu_access; > >>>>>>>>> + dma_buf->ops = &amdgpu_dmabuf_ops; > >>>>>>>> What happens when another thread is using > amdgpu_dmabuf_ops > >> to > >>>> call > >>>>>>>> begin_cpu_access/end_cpu_access when you are fixing it up > again? > >>>>>>>> > >>>>>>>> I would just completely drop the two callbacks, pinning is not > >>>>>>>> necessary for CPU access and thinking more about it it actually > >>>>>>>> has some unwanted side effects. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Christian. > >>>>>>>> > >>>>>>>> Am 19.09.2017 um 23:22 schrieb Samuel Li: > >>>>>>>>>>> + vma->vm_pgoff = amdgpu_bo_mmap_offset(bo) >> > >>>> PAGE_SHIFT; > >>>>>>>>>> Maybe better use "vma->vm_pgoff += > >>>>>> amdgpu_bo_mmap_offset(bo) >> > >>>>>>>> PAGE_SHIFT;", but I'm not sure. > >>>>>>>>>> How other drivers handle this? > >>>>>>>>> This is a good catch. Looks like pgoff is honored during prime > >>>>>>>>> mmap, not a > >>>>>>>> fake offset here. > >>>>>>>>>>> + dma_buf->ops = &amdgpu_dmabuf_ops; > >>>>>>>>>> This isn't race free and needs to be fixed. > >>>>>>>>>> Better add callbacks to drm_prime.c similar to > >>>>>>>> drm_gem_dmabuf_mmap(). > >>>>>>>>> What do you mean "This isn't race free"? > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Sam > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 2017-09-15 11:05 AM, Christian König wrote: > >>>>>>>>>> Am 14.09.2017 um 00:39 schrieb Samuel Li: > >>>>>>>>>>> v2: drop hdp invalidate/flush. > >>>>>>>>>>> > >>>>>>>>>>> Signed-off-by: Samuel Li <Samuel.Li at amd.com> > >>>>>>>>>>> --- > >>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 ++ > >>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +- > >>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 77 > >>>>>>>> ++++++++++++++++++++++++++++++- > >>>>>>>>>>> 3 files changed, 81 insertions(+), 2 deletions(-) > >>>>>>>>>>> > >>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>>>>>>>>>> index d2aaad7..188b705 100644 > >>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>>>>>>>>>> @@ -395,11 +395,14 @@ > >>>>>> amdgpu_gem_prime_import_sg_table(struct > >>>>>>>> drm_device *dev, > >>>>>>>>>>> struct dma_buf *amdgpu_gem_prime_export(struct > >>>> drm_device > >>>>>> *dev, > >>>>>>>>>>> struct drm_gem_object *gobj, > >>>>>>>>>>> int flags); > >>>>>>>>>>> +struct drm_gem_object > *amdgpu_gem_prime_import(struct > >>>>>>>> drm_device *dev, > >>>>>>>>>>> + struct dma_buf *dma_buf); > >>>>>>>>>>> int amdgpu_gem_prime_pin(struct drm_gem_object > *obj); > >>>>>>>>>>> void amdgpu_gem_prime_unpin(struct drm_gem_object > >> *obj); > >>>>>>>>>>> struct reservation_object > >>>>>>>>>>> *amdgpu_gem_prime_res_obj(struct > >>>>>>>> drm_gem_object *); > >>>>>>>>>>> void *amdgpu_gem_prime_vmap(struct drm_gem_object > >> *obj); > >>>>>>>>>>> void amdgpu_gem_prime_vunmap(struct > drm_gem_object > >>>> *obj, > >>>>>> void > >>>>>>>>>>> *vaddr); > >>>>>>>>>>> +int amdgpu_gem_prime_mmap(struct drm_gem_object > *obj, > >>>> struct > >>>>>>>>>>> +vm_area_struct *vma); > >>>>>>>>>>> int amdgpu_gem_debugfs_init(struct amdgpu_device > *adev); > >>>>>>>>>>> /* sub-allocation manager, it has to be protected > >>>>>>>>>>> by another > >>>> lock. > >>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>>>>>>>>>> index 2cdf844..9b63ac5 100644 > >>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>>>>>>>>>> @@ -835,7 +835,7 @@ static struct drm_driver kms_driver = { > >>>>>>>>>>> .prime_handle_to_fd = drm_gem_prime_handle_to_fd, > >>>>>>>>>>> .prime_fd_to_handle = drm_gem_prime_fd_to_handle, > >>>>>>>>>>> .gem_prime_export = amdgpu_gem_prime_export, > >>>>>>>>>>> - .gem_prime_import = drm_gem_prime_import, > >>>>>>>>>>> + .gem_prime_import = amdgpu_gem_prime_import, > >>>>>>>>>>> .gem_prime_pin = amdgpu_gem_prime_pin, > >>>>>>>>>>> .gem_prime_unpin = amdgpu_gem_prime_unpin, > >>>>>>>>>>> .gem_prime_res_obj = amdgpu_gem_prime_res_obj, > @@ > >>>>>>>>>>> -843,6 > >>>>>>>>>>> +843,7 @@ static struct drm_driver kms_driver = { > >>>>>>>>>>> .gem_prime_import_sg_table = > >>>>>>>> amdgpu_gem_prime_import_sg_table, > >>>>>>>>>>> .gem_prime_vmap = amdgpu_gem_prime_vmap, > >>>>>>>>>>> .gem_prime_vunmap = amdgpu_gem_prime_vunmap, > >>>>>>>>>>> + .gem_prime_mmap = amdgpu_gem_prime_mmap, > >>>>>>>>>>> .name = DRIVER_NAME, > >>>>>>>>>>> .desc = DRIVER_DESC, diff --git > >>>>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > >>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > >>>>>>>>>>> index 5b3f928..13c977a 100644 > >>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > >>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c > >>>>>>>>>>> @@ -57,6 +57,40 @@ void > amdgpu_gem_prime_vunmap(struct > >>>>>>>> drm_gem_object *obj, void *vaddr) > >>>>>>>>>>> ttm_bo_kunmap(&bo->dma_buf_vmap); > >>>>>>>>>>> } > >>>>>>>>>>> +int amdgpu_gem_prime_mmap(struct drm_gem_object > >> *obj, > >>>>>> struct > >>>>>>>>>>> vm_area_struct *vma) > >>>>>>>>>>> +{ > >>>>>>>>>>> + struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); > >>>>>>>>>>> + struct amdgpu_device *adev = amdgpu_ttm_adev(bo- > >>>>> tbo.bdev); > >>>>>>>>>>> + unsigned asize = amdgpu_bo_size(bo); > >>>>>>>>>>> + int ret; > >>>>>>>>>>> + > >>>>>>>>>>> + if (!vma->vm_file) > >>>>>>>>>>> + return -ENODEV; > >>>>>>>>>>> + > >>>>>>>>>>> + if (adev == NULL) > >>>>>>>>>>> + return -ENODEV; > >>>>>>>>>>> + > >>>>>>>>>>> + /* Check for valid size. */ > >>>>>>>>>>> + if (asize < vma->vm_end - vma->vm_start) > >>>>>>>>>>> + return -EINVAL; > >>>>>>>>>>> + > >>>>>>>>>>> + if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm) || > >>>>>>>>>>> + (bo->flags & > AMDGPU_GEM_CREATE_NO_CPU_ACCESS)) { > >>>>>>>>>>> + return -EPERM; > >>>>>>>>>>> + } > >>>>>>>>>>> + vma->vm_pgoff = amdgpu_bo_mmap_offset(bo) >> > >>>> PAGE_SHIFT; > >>>>>>>>>> Maybe better use "vma->vm_pgoff += > >>>>>> amdgpu_bo_mmap_offset(bo) >> > >>>>>>>> PAGE_SHIFT;", but I'm not sure. > >>>>>>>>>> How other drivers handle this? > >>>>>>>>>> > >>>>>>>>>>> + > >>>>>>>>>>> + /* prime mmap does not need to check access, so allow > >>>>>>>>>>> + here > >> */ > >>>>>>>>>>> + ret = drm_vma_node_allow(&obj->vma_node, vma- > >>> vm_file- > >>>>>>>>> private_data); > >>>>>>>>>>> + if (ret) > >>>>>>>>>>> + return ret; > >>>>>>>>>>> + > >>>>>>>>>>> + ret = ttm_bo_mmap(vma->vm_file, vma, &adev- > >>> mman.bdev); > >>>>>>>>>>> + drm_vma_node_revoke(&obj->vma_node, > >>>>>>>>>>> + vma->vm_file->private_data); > >>>>>>>>>>> + > >>>>>>>>>>> + return ret; > >>>>>>>>>>> +} > >>>>>>>>>>> + > >>>>>>>>>>> struct drm_gem_object * > >>>>>>>>>>> amdgpu_gem_prime_import_sg_table(struct drm_device > >> *dev, > >>>>>>>>>>> struct dma_buf_attachment *attach, > >>>>>>>>>>> @@ > >>>>>>>>>>> -130,14 > >>>>>>>>>>> +164,55 @@ struct reservation_object > >>>>>>>> *amdgpu_gem_prime_res_obj(struct drm_gem_object *obj) > >>>>>>>>>>> return bo->tbo.resv; > >>>>>>>>>>> } > >>>>>>>>>>> +static int amdgpu_gem_begin_cpu_access(struct > >>>>>>>>>>> dma_buf *dma_buf, enum dma_data_direction direction) > >>>>>>>>>>> +{ > >>>>>>>>>>> + return amdgpu_gem_prime_pin(dma_buf->priv); > >>>>>>>>>>> +} > >>>>>>>>>>> + > >>>>>>>>>>> +static int amdgpu_gem_end_cpu_access(struct dma_buf > >>>> *dma_buf, > >>>>>>>> enum > >>>>>>>>>>> +dma_data_direction direction) { > >>>>>>>>>>> + amdgpu_gem_prime_unpin(dma_buf->priv); > >>>>>>>>>>> + > >>>>>>>>>>> + return 0; > >>>>>>>>>>> +} > >>>>>>>>>>> + > >>>>>>>>>>> +static struct dma_buf_ops amdgpu_dmabuf_ops; > >>>>>>>>>>> + > >>>>>>>>>>> struct dma_buf *amdgpu_gem_prime_export(struct > >>>> drm_device > >>>>>> *dev, > >>>>>>>>>>> struct drm_gem_object *gobj, > >>>>>>>>>>> int flags) > >>>>>>>>>>> { > >>>>>>>>>>> struct amdgpu_bo *bo = gem_to_amdgpu_bo(gobj); > >>>>>>>>>>> + struct dma_buf *dma_buf; > >>>>>>>>>>> if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) > >>>>>>>>>>> return ERR_PTR(-EPERM); > >>>>>>>>>>> - return drm_gem_prime_export(dev, gobj, flags); > >>>>>>>>>>> + dma_buf = drm_gem_prime_export(dev, gobj, flags); > >>>>>>>>>>> + amdgpu_dmabuf_ops = *(dma_buf->ops); > >>>>>>>>>>> + amdgpu_dmabuf_ops.begin_cpu_access = > >>>>>>>> amdgpu_gem_begin_cpu_access; > >>>>>>>>>>> + amdgpu_dmabuf_ops.end_cpu_access = > >>>>>>>> amdgpu_gem_end_cpu_access; > >>>>>>>>>>> + dma_buf->ops = &amdgpu_dmabuf_ops; > >>>>>>>>>> This isn't race free and needs to be fixed. > >>>>>>>>>> > >>>>>>>>>> Better add callbacks to drm_prime.c similar to > >>>>>>>> drm_gem_dmabuf_mmap(). > >>>>>>>>>> Alternative you could just completely drop > >>>>>>>> amdgpu_gem_begin_cpu_access() and > >>>> amdgpu_gem_end_cpu_access() > >>>>>> as > >>>>>>>> well. > >>>>>>>>>> When the buffer is shared between device it is pinned anyway > >>>>>>>>>> and when > >>>>>>>> it isn't shared ttm_bo_mmap() is able to handle VRAM mappings > >>>>>>>> as > >>>> well. > >>>>>>>>>> Regards, > >>>>>>>>>> Christian. > >>>>>>>>>> > >>>>>>>>>>> + > >>>>>>>>>>> + return dma_buf; > >>>>>>>>>>> +} > >>>>>>>>>>> + > >>>>>>>>>>> +struct drm_gem_object > *amdgpu_gem_prime_import(struct > >>>>>>>> drm_device *dev, > >>>>>>>>>>> + struct dma_buf *dma_buf) { > >>>>>>>>>>> + struct drm_gem_object *obj; > >>>>>>>>>>> + > >>>>>>>>>>> + if (dma_buf->ops == &amdgpu_dmabuf_ops) { > >>>>>>>>>>> + obj = dma_buf->priv; > >>>>>>>>>>> + if (obj->dev == dev) { > >>>>>>>>>>> + /* > >>>>>>>>>>> + * Importing dmabuf exported from out own gem > >> increases > >>>>>>>>>>> + * refcount on gem itself instead of f_count of dmabuf. > >>>>>>>>>>> + */ > >>>>>>>>>>> + drm_gem_object_get(obj); > >>>>>>>>>>> + return obj; > >>>>>>>>>>> + } > >>>>>>>>>>> + } > >>>>>>>>>>> + > >>>>>>>>>>> + return drm_gem_prime_import(dev, dma_buf); > >>>>>>>>>>> } > >>> _______________________________________________ > >>> amd-gfx mailing list > >>> amd-gfx at lists.freedesktop.org > >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >