[Resnding with the CC list fixed] On Fri 22-06-18 18:40:26, Michal Hocko wrote: > On Fri 22-06-18 12:18:46, Jerome Glisse wrote: > > On Fri, Jun 22, 2018 at 05:57:16PM +0200, Michal Hocko wrote: > > > On Fri 22-06-18 16:36:49, Chris Wilson wrote: > > > > Quoting Michal Hocko (2018-06-22 16:02:42) > > > > > Hi, > > > > > this is an RFC and not tested at all. I am not very familiar with the > > > > > mmu notifiers semantics very much so this is a crude attempt to achieve > > > > > what I need basically. It might be completely wrong but I would like > > > > > to discuss what would be a better way if that is the case. > > > > > > > > > > get_maintainers gave me quite large list of people to CC so I had to trim > > > > > it down. If you think I have forgot somebody, please let me know > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c > > > > > index 854bd51b9478..5285df9331fa 100644 > > > > > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c > > > > > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c > > > > > @@ -112,10 +112,11 @@ static void del_object(struct i915_mmu_object *mo) > > > > > mo->attached = false; > > > > > } > > > > > > > > > > -static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn, > > > > > +static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn, > > > > > struct mm_struct *mm, > > > > > unsigned long start, > > > > > - unsigned long end) > > > > > + unsigned long end, > > > > > + bool blockable) > > > > > { > > > > > struct i915_mmu_notifier *mn = > > > > > container_of(_mn, struct i915_mmu_notifier, mn); > > > > > @@ -124,7 +125,7 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn, > > > > > LIST_HEAD(cancelled); > > > > > > > > > > if (RB_EMPTY_ROOT(&mn->objects.rb_root)) > > > > > - return; > > > > > + return 0; > > > > > > > > The principle wait here is for the HW (even after fixing all the locks > > > > to be not so coarse, we still have to wait for the HW to finish its > > > > access). > > > > > > Is this wait bound or it can take basically arbitrary amount of time? > > > > Arbitrary amount of time but in desktop use case you can assume that > > it should never go above 16ms for a 60frame per second rendering of > > your desktop (in GPU compute case this kind of assumption does not > > hold). Is the process exit_state already updated by the time this mmu > > notifier callbacks happen ? > > What do you mean? The process is killed (by SIGKILL) at the time but we > do not know much more than that. The task might be stuck anywhere in the > kernel before handling that signal. > > > > > The first pass would be then to not do anything here if > > > > !blockable. > > > > > > something like this? (incremental diff) > > > > What i wanted to do with HMM and mmu notifier is split the invalidation > > in 2 pass. First pass tell the drivers to stop/cancel pending jobs that > > depends on the range and invalidate internal driver states (like clear > > buffer object pages array in case of GPU but not GPU page table). While > > the second callback would do the actual wait on the GPU to be done and > > update the GPU page table. > > What can you do after the first phase? Can I unmap the range? > > > Now in this scheme in case the task is already in some exit state and > > that all CPU threads are frozen/kill then we can probably find a way to > > do the first path mostly lock less. AFAICR nor AMD nor Intel allow to > > share userptr bo hence a uptr bo should only ever be access through > > ioctl submited by the process. > > > > The second call can then be delayed and ping from time to time to see > > if GPU jobs are done. > > > > > > Note that what you propose might still be useful as in case there is > > no buffer object for a range then OOM can make progress in freeing a > > range of memory. It is very likely that significant virtual address > > range of a process and backing memory can be reclaim that way. This > > assume OOM reclaim vma by vma or in some form of granularity like > > reclaiming 1GB by 1GB. Or we could also update blocking callback to > > return range that are blocking that way OOM can reclaim around. > > Exactly my point. What we have right now is all or nothing which is > obviously too coarse to be useful. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html