On 2018å¹´08æ??08æ?¥ 14:48, Christian König wrote: > Am 08.08.2018 um 06:23 schrieb zhoucm1: >> >> >> On 2018å¹´08æ??08æ?¥ 12:08, Junwei Zhang wrote: >>> Userspace needs to know if the user memory is from BO or malloc. >>> >>> v2: update mutex range and rebase >>> >>> Signed-off-by: Junwei Zhang <Jerry.Zhang at amd.com> >>> --- >>>  amdgpu/amdgpu.h   | 23 +++++++++++++++++++++++ >>>  amdgpu/amdgpu_bo.c | 34 ++++++++++++++++++++++++++++++++++ >>>  2 files changed, 57 insertions(+) >>> >>> diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h >>> index be83b45..a8c353c 100644 >>> --- a/amdgpu/amdgpu.h >>> +++ b/amdgpu/amdgpu.h >>> @@ -678,6 +678,29 @@ int >>> amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev, >>>                      amdgpu_bo_handle *buf_handle); >>>   /** >>> + * Validate if the user memory comes from BO >>> + * >>> + * \param dev - [in] Device handle. See #amdgpu_device_initialize() >>> + * \param cpu - [in] CPU address of user allocated memory which we >>> + * want to map to GPU address space (make GPU accessible) >>> + * (This address must be correctly aligned). >>> + * \param size - [in] Size of allocation (must be correctly aligned) >>> + * \param buf_handle - [out] Buffer handle for the userptr memory >>> + * if the user memory is not from BO, the buf_handle will be NULL. >>> + * \param offset_in_bo - [out] offset in this BO for this user memory >>> + * >>> + * >>> + * \return  0 on success\n >>> + *         <0 - Negative POSIX Error code >>> + * >>> +*/ >>> +int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev, >>> +                 void *cpu, >>> +                 uint64_t size, >>> +                 amdgpu_bo_handle *buf_handle, >>> +                 uint64_t *offset_in_bo); >>> + >>> +/** >>>   * Free previosuly allocated memory >>>   * >>>   * \param  dev          - \c [in] Device handle. See >>> #amdgpu_device_initialize() >>> diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c >>> index b24e698..a7f0662 100644 >>> --- a/amdgpu/amdgpu_bo.c >>> +++ b/amdgpu/amdgpu_bo.c >>> @@ -529,6 +529,40 @@ int amdgpu_bo_wait_for_idle(amdgpu_bo_handle bo, >>>      } >>>  } >>>  +int amdgpu_find_bo_by_cpu_mapping(amdgpu_device_handle dev, >>> +                 void *cpu, >>> +                 uint64_t size, >>> +                 amdgpu_bo_handle *buf_handle, >>> +                 uint64_t *offset_in_bo) >>> +{ >>> +   int i; >>> +   struct amdgpu_bo *bo; >>> + >>> +   if (cpu == NULL || size == 0) >>> +       return -EINVAL; >>> + >>> +   pthread_mutex_lock(&dev->bo_table_mutex); >>> +   for (i = 0; i < dev->bo_handles.max_key; i++) { >> Hi Jerry, >> >> As Christian catched before, iterating all BOs of device will >> introduce much CPU overhead, this isn't good direction. >> Since cpu virtual address is per-process, you should go to kernel to >> find them from vm tree, which obviously takes less time. > > Yeah, but is also much more overhead to maintain. > > Since this is only to fix the behavior of a single buggy application > at least I'm fine to keep the workaround as simple as this. I like 'workaround' expression, if Jerry adds 'workaround' comments here, I'm ok as well. Regards, David Zhou > > If we find a wider use we can still start to use the kernel > implementation again. > > Regards, > Christian. > >> >> Regards, >> David Zhou >>> +       bo = handle_table_lookup(&dev->bo_handles, i); >>> +       if (!bo || !bo->cpu_ptr || size > bo->alloc_size) >>> +           continue; >>> +       if (cpu >= bo->cpu_ptr && cpu < (bo->cpu_ptr + >>> bo->alloc_size)) >>> +           break; >>> +   } >>> + >>> +   if (i < dev->bo_handles.max_key) { >>> +       atomic_inc(&bo->refcount); >>> +       *buf_handle = bo; >>> +       *offset_in_bo = cpu - bo->cpu_ptr; >>> +   } else { >>> +       *buf_handle = NULL; >>> +       *offset_in_bo = 0; >>> +   } >>> +   pthread_mutex_unlock(&dev->bo_table_mutex); >>> + >>> +   return 0; >>> +} >>> + >>>  int amdgpu_create_bo_from_user_mem(amdgpu_device_handle dev, >>>                      void *cpu, >>>                      uint64_t size, >> >