On Fri 22-06-18 16:09:06, Felix Kuehling wrote: > On 2018-06-22 11:24 AM, Michal Hocko wrote: > > On Fri 22-06-18 17:13:02, Christian König wrote: > >> Hi Michal, > >> > >> [Adding Felix as well] > >> > >> Well first of all you have a misconception why at least the AMD graphics > >> driver need to be able to sleep in an MMU notifier: We need to sleep because > >> we need to wait for hardware operations to finish and *NOT* because we need > >> to wait for locks. > >> > >> I'm not sure if your flag now means that you generally can't sleep in MMU > >> notifiers any more, but if that's the case at least AMD hardware will break > >> badly. In our case the approach of waiting for a short time for the process > >> to be reaped and then select another victim actually sounds like the right > >> thing to do. > > Well, I do not need to make the notifier code non blocking all the time. > > All I need is to ensure that it won't sleep if the flag says so and > > return -EAGAIN instead. > > > > So here is what I do for amdgpu: > > In the case of KFD we also need to take the DQM lock: > > amdgpu_mn_invalidate_range_start_hsa -> amdgpu_amdkfd_evict_userptr -> > kgd2kfd_quiesce_mm -> kfd_process_evict_queues -> evict_process_queues_cpsch > > So we'd need to pass the blockable parameter all the way through that > call chain. Thanks, I have missed that part. So I guess I will start with something similar to intel-gfx and back off when the current range needs some treatment. So this on top. Does it look correct? diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index d138a526feff..e2d422b3eb0b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -266,6 +266,11 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn, struct amdgpu_mn_node *node; struct amdgpu_bo *bo; + if (!blockable) { + amdgpu_mn_read_unlock(); + return -EAGAIN; + } + node = container_of(it, struct amdgpu_mn_node, it); it = interval_tree_iter_next(it, start, end); -- Michal Hocko SUSE Labs