On 2018-02-09 07:34 AM, Christian König wrote: > I really wonder if sharing the GPUVM instance from a render node file > descriptor wouldn't be easier. > > You could just use the existing IOCTL for allocating and mapping > memory on a render node and then give the prepared fd to kfd to use. In amd-kfd-staging we have an ioctl to import a DMABuf into the KFD VM for interoperability between graphics and compute. I'm planning to upstream that later. It still depends on the KFD calls for managing the GPU mappings. The KFD GPU mapping calls need to interact with the eviction logic. TLB flushing involves the HIQ/KIQ for figuring out the VMID/PASID mapping, which is managed asynchronously by the HWS, not amdgpu_ids.c. User pointers need a different MMU notifier. AQL queue buffers on some GPUs need a double-mapping workaround for a HW wraparound bug. I could go on. So this is not easy to retrofit into amdgpu_gem and amdgpu_cs. Also, KFD VMs are created differently (AMDGPU_VM_CONTEXT_COMPUTE, amdkfd_vm wrapper structure). What's more, buffers allocated through amdgpu_gem calls create GEM objects that we don't need. And exporting and importing DMABufs adds more overhead and a potential to run into the process file-descriptor limit (maybe the FD could be discarded after importing). I honestly thought about whether this would be feasible when we implemented the CPU mapping through the DRM FD. But it would be nothing short of a complete redesign of the KFD memory management code. It would be months of work, more instability, for questionable benefits. I don't think it would be in the interest of end users and customers. I just thought of a slightly different approach I would consider more realistic, without having thought through all the details: Adding KFD-specific memory management ioctls to the amdgpu device. Basically call amdgpu_amdkfd_gpuvm functions from amdgpu ioctl functions instead of KFD ioctl functions. But we'd still have KFD ioctls for other things, and the new amdgpu ioctls would be KFD-specific and not useful for graphics applications. It's probably still several weeks of work, but shouldn't have major teething issues because the internal logic and functionality would be basically unchanged. It would just move the ioctls from one device to another. Regards,  Felix > > Regards, > Christian. > > Am 07.02.2018 um 02:32 schrieb Felix Kuehling: >> From: Oak Zeng <Oak.Zeng at amd.com> >> >> Populate DRM render device minor in kfd topology >> >> Signed-off-by: Oak Zeng <Oak.Zeng at amd.com> >> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com> >> --- >>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 ++++ >>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 + >>  2 files changed, 5 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >> index 2506155..ac28abc 100644 >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >> @@ -441,6 +441,8 @@ static ssize_t node_show(struct kobject *kobj, >> struct attribute *attr, >>              dev->node_props.device_id); >>      sysfs_show_32bit_prop(buffer, "location_id", >>              dev->node_props.location_id); >> +   sysfs_show_32bit_prop(buffer, "drm_render_minor", >> +           dev->node_props.drm_render_minor); >>       if (dev->gpu) { >>          log_max_watch_addr = >> @@ -1214,6 +1216,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu) >>          dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(dev->gpu->kgd); >>      dev->node_props.max_engine_clk_ccompute = >>          cpufreq_quick_get_max(0) / 1000; >> +   dev->node_props.drm_render_minor = >> +       gpu->shared_resources.drm_render_minor; >>       kfd_fill_mem_clk_max_info(dev); >>      kfd_fill_iolink_non_crat_info(dev); >> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >> index c0be2be..eb54cfc 100644 >> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >> @@ -71,6 +71,7 @@ struct kfd_node_properties { >>      uint32_t location_id; >>      uint32_t max_engine_clk_fcompute; >>      uint32_t max_engine_clk_ccompute; >> +   int32_t drm_render_minor; >>      uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE]; >>  }; >>  >