On 2018-02-11 04:55 AM, Christian König wrote: > Am 09.02.2018 um 21:31 schrieb Felix Kuehling: >> On 2018-02-09 07:34 AM, Christian König wrote: >>> I really wonder if sharing the GPUVM instance from a render node file >>> descriptor wouldn't be easier. >>> >>> You could just use the existing IOCTL for allocating and mapping >>> memory on a render node and then give the prepared fd to kfd to use. >> In amd-kfd-staging we have an ioctl to import a DMABuf into the KFD VM >> for interoperability between graphics and compute. I'm planning to >> upstream that later. > > Could you move that around and upstream it first? Sure. I assume with "first" you mean before userptr and Vega10 support. I was going to leave it for later because graphics interoperability not an essential feature. Userptr is more important to get existing ROCm user mode to work properly. > >> It still depends on the KFD calls for managing the >> GPU mappings. The KFD GPU mapping calls need to interact with the >> eviction logic. TLB flushing involves the HIQ/KIQ for figuring out the >> VMID/PASID mapping, which is managed asynchronously by the HWS, not >> amdgpu_ids.c. User pointers need a different MMU notifier. AQL queue >> buffers on some GPUs need a double-mapping workaround for a HW >> wraparound bug. I could go on. So this is not easy to retrofit into >> amdgpu_gem and amdgpu_cs. Also, KFD VMs are created differently >> (AMDGPU_VM_CONTEXT_COMPUTE, amdkfd_vm wrapper structure). > > Well, that is actually the reason why I'm asking about it. > > First of all kernel development is not use case driver, e.g. we should > not implement something because userspace needs it in a specific way, > but rather because the hardware supports it in a specific way. > > This means that in theory we should have the fixes for the HW problems > in both interfaces. That doesn't make sense at the moment because we > don't support user space queue through the render node file > descriptor, but people are starting to ask for that as well. >> What's more, buffers allocated through amdgpu_gem calls create GEM >> objects that we don't need. > > I see that as an advantage rather than a problem, cause it fixes a > couple of problems with the KFD where the address space of the inode > is not managed correctly as far as I can see. I don't think GEM is involved in the management of address space. That's handled inside TTM and drm_vma_manager, both of which we are using. We have tested CPU mappings with evictions and this is working correctly now. We had problems with this before, when we were CPU-mapping our buffers through the KFD device FD. > > That implementation issues never caused problems right now because you > never tried to unmap doorbells. But with the new eviction code that is > about to change, isn't it? I don't understand this comment. What does this have to do with doorbells? Doorbells are never unmapped. When queues are evicted doorbells remain mapped to the process. User mode can continue writing to doorbells, though they won't have any immediate effect while the corresponding queues are unmapped from the HQDs. > >> And exporting and importing DMABufs adds >> more overhead and a potential to run into the process file-descriptor >> limit (maybe the FD could be discarded after importing). > > Closing the file descriptor is a must have after importing, so that > isn't an issue. > > But I agree that exporting and reimporting the file descriptor adds > some additional overhead. > >> I honestly thought about whether this would be feasible when we >> implemented the CPU mapping through the DRM FD. But it would be nothing >> short of a complete redesign of the KFD memory management code. It would >> be months of work, more instability, for questionable benefits. I don't >> think it would be in the interest of end users and customers. > > Yeah, agree on that. > >> I just thought of a slightly different approach I would consider more >> realistic, without having thought through all the details: Adding >> KFD-specific memory management ioctls to the amdgpu device. Basically >> call amdgpu_amdkfd_gpuvm functions from amdgpu ioctl functions instead >> of KFD ioctl functions. But we'd still have KFD ioctls for other things, >> and the new amdgpu ioctls would be KFD-specific and not useful for >> graphics applications. It's probably still several weeks of work, but >> shouldn't have major teething issues because the internal logic and >> functionality would be basically unchanged. It would just move the >> ioctls from one device to another. > > My thinking went into a similar direction. But instead of exposing the > KFD IOCTLs through the DRM FD, I would let the KFD import a DRM FD. > > And then use the DRM FD in the KFD for things like the buffer provider > of a device, e.g. no separate IDR for BOs in the KFD but rather a > reference to the DRM FD. I'm not sure I understand this. With "DRM FD" I think you mean the device file descriptor. Then have KFD call file operation on the FD to call AMDGPU ioctls? Is there any precedent for this that I can look at for an example? Or do you mean a DMABuf FD exported by GEM? After importing the buffer and closing the FD, we still need a way to refer to the buffer for managing the GPU mappings and for releasing the reference. So we still need handles that are currently provided by our own IDR. I think we'll still need that. Finally, the kfd_ioctl_alloc_memory_of_gpu will later also be used for creating userptr BOs. Those can't be exported as DMABufs, so we still need our own ioctl for creating that type of BO. That was going to be my next patch series. Regards,  Felix > > We can still manage the VM through KFD IOCTL, but the BOs and the VM > are actually provided by the DRM FD. > > Regards, > Christian. > >> Regards, >>   Felix >> >>> Regards, >>> Christian. >>> >>> Am 07.02.2018 um 02:32 schrieb Felix Kuehling: >>>> From: Oak Zeng <Oak.Zeng at amd.com> >>>> >>>> Populate DRM render device minor in kfd topology >>>> >>>> Signed-off-by: Oak Zeng <Oak.Zeng at amd.com> >>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com> >>>> --- >>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 ++++ >>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 + >>>>   2 files changed, 5 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>>> index 2506155..ac28abc 100644 >>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>>> @@ -441,6 +441,8 @@ static ssize_t node_show(struct kobject *kobj, >>>> struct attribute *attr, >>>>               dev->node_props.device_id); >>>>       sysfs_show_32bit_prop(buffer, "location_id", >>>>               dev->node_props.location_id); >>>> +   sysfs_show_32bit_prop(buffer, "drm_render_minor", >>>> +           dev->node_props.drm_render_minor); >>>>        if (dev->gpu) { >>>>           log_max_watch_addr = >>>> @@ -1214,6 +1216,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu) >>>>           >>>> dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(dev->gpu->kgd); >>>>       dev->node_props.max_engine_clk_ccompute = >>>>           cpufreq_quick_get_max(0) / 1000; >>>> +   dev->node_props.drm_render_minor = >>>> +       gpu->shared_resources.drm_render_minor; >>>>        kfd_fill_mem_clk_max_info(dev); >>>>       kfd_fill_iolink_non_crat_info(dev); >>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >>>> index c0be2be..eb54cfc 100644 >>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >>>> @@ -71,6 +71,7 @@ struct kfd_node_properties { >>>>       uint32_t location_id; >>>>       uint32_t max_engine_clk_fcompute; >>>>       uint32_t max_engine_clk_ccompute; >>>> +   int32_t drm_render_minor; >>>>       uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE]; >>>>   }; >>>>   >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >