On Fri, Jun 22, 2018 at 05:57:16PM +0200, Michal Hocko wrote: > On Fri 22-06-18 16:36:49, Chris Wilson wrote: > > Quoting Michal Hocko (2018-06-22 16:02:42) > > > Hi, > > > this is an RFC and not tested at all. I am not very familiar with the > > > mmu notifiers semantics very much so this is a crude attempt to achieve > > > what I need basically. It might be completely wrong but I would like > > > to discuss what would be a better way if that is the case. > > > > > > get_maintainers gave me quite large list of people to CC so I had to trim > > > it down. If you think I have forgot somebody, please let me know > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c > > > index 854bd51b9478..5285df9331fa 100644 > > > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c > > > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c > > > @@ -112,10 +112,11 @@ static void del_object(struct i915_mmu_object *mo) > > > mo->attached = false; > > > } > > > > > > -static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn, > > > +static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn, > > > struct mm_struct *mm, > > > unsigned long start, > > > - unsigned long end) > > > + unsigned long end, > > > + bool blockable) > > > { > > > struct i915_mmu_notifier *mn = > > > container_of(_mn, struct i915_mmu_notifier, mn); > > > @@ -124,7 +125,7 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn, > > > LIST_HEAD(cancelled); > > > > > > if (RB_EMPTY_ROOT(&mn->objects.rb_root)) > > > - return; > > > + return 0; > > > > The principle wait here is for the HW (even after fixing all the locks > > to be not so coarse, we still have to wait for the HW to finish its > > access). > > Is this wait bound or it can take basically arbitrary amount of time? Arbitrary amount of time but in desktop use case you can assume that it should never go above 16ms for a 60frame per second rendering of your desktop (in GPU compute case this kind of assumption does not hold). Is the process exit_state already updated by the time this mmu notifier callbacks happen ? > > > The first pass would be then to not do anything here if > > !blockable. > > something like this? (incremental diff) What i wanted to do with HMM and mmu notifier is split the invalidation in 2 pass. First pass tell the drivers to stop/cancel pending jobs that depends on the range and invalidate internal driver states (like clear buffer object pages array in case of GPU but not GPU page table). While the second callback would do the actual wait on the GPU to be done and update the GPU page table. Now in this scheme in case the task is already in some exit state and that all CPU threads are frozen/kill then we can probably find a way to do the first path mostly lock less. AFAICR nor AMD nor Intel allow to share userptr bo hence a uptr bo should only ever be access through ioctl submited by the process. The second call can then be delayed and ping from time to time to see if GPU jobs are done. Note that what you propose might still be useful as in case there is no buffer object for a range then OOM can make progress in freeing a range of memory. It is very likely that significant virtual address range of a process and backing memory can be reclaim that way. This assume OOM reclaim vma by vma or in some form of granularity like reclaiming 1GB by 1GB. Or we could also update blocking callback to return range that are blocking that way OOM can reclaim around. Cheers, Jérôme