On Thu, Oct 08, 2015 at 05:26:44PM +0100, Tvrtko Ursulin wrote: > > On 08/10/15 17:03, Ville Syrjälä wrote: > > On Thu, Oct 08, 2015 at 03:03:11PM +0100, Tvrtko Ursulin wrote: > >> > >> Hi, > >> > >> On 08/10/15 14:45, Ville Syrjälä wrote: > >>> On Mon, Oct 05, 2015 at 01:26:36PM +0100, Tvrtko Ursulin wrote: > >>>> From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>>> > >>>> Prevent leaking VMAs and PPGTT VMs when objects are imported > >>>> via flink. > >>>> > >>>> Scenario is that any VMAs created by the importer will be left > >>>> dangling after the importer exits, or destroys the PPGTT context > >>>> with which they are associated. > >>>> > >>>> This is caused by object destruction not running when the > >>>> importer closes the buffer object handle due the reference held > >>>> by the exporter. This also leaks the VM since the VMA has a > >>>> reference on it. > >>>> > >>>> In practice these leaks can be observed by stopping and starting > >>>> the X server on a kernel with fbcon compiled in. Every time > >>>> X server exits another VMA will be leaked against the fbcon's > >>>> frame buffer object. > >>>> > >>>> Also on systems where flink buffer sharing is used extensively, > >>>> like Android, this leak has even more serious consequences. > >>>> > >>>> This version is takes a general approach from the earlier work > >>>> by Rafael Barbalho (drm/i915: Clean-up PPGTT on context > >>>> destruction) and tries to incorporate the subsequent discussion > >>>> between Chris Wilson and Daniel Vetter. > >>>> > >>>> v2: > >>>> > >>>> Removed immediate cleanup on object retire - it was causing a > >>>> recursive VMA unbind via i915_gem_object_wait_rendering. And > >>>> it is in fact not even needed since by definition context > >>>> cleanup worker runs only after the last context reference has > >>>> been dropped, hence all VMAs against the VM belonging to the > >>>> context are already on the inactive list. > >>>> > >>>> v3: > >>>> > >>>> Previous version could deadlock since VMA unbind waits on any > >>>> rendering on an object to complete. Objects can be busy in a > >>>> different VM which would mean that the cleanup loop would do > >>>> the wait with the struct mutex held. > >>>> > >>>> This is an even simpler approach where we just unbind VMAs > >>>> without waiting since we know all VMAs belonging to this VM > >>>> are idle, and there is nothing in flight, at the point > >>>> context destructor runs. > >>>> > >>>> v4: > >>>> > >>>> Double underscore prefix for __915_vma_unbind_no_wait and a > >>>> commit message typo fix. (Michel Thierry) > >>>> > >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>>> Testcase: igt/gem_ppgtt.c/flink-and-exit-vma-leak > >>>> Reviewed-by: Michel Thierry <michel.thierry@xxxxxxxxx> > >>>> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > >>>> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>>> Cc: Rafael Barbalho <rafael.barbalho@xxxxxxxxx> > >>>> Cc: Michel Thierry <michel.thierry@xxxxxxxxx> > >>> > >>>> --- > >>>> drivers/gpu/drm/i915/i915_drv.h | 5 +++++ > >>>> drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++++++++---- > >>>> drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++++++ > >>>> 3 files changed, 45 insertions(+), 4 deletions(-) > >>>> > >>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > >>>> index 824e7245044d..58afa1bb2957 100644 > >>>> --- a/drivers/gpu/drm/i915/i915_drv.h > >>>> +++ b/drivers/gpu/drm/i915/i915_drv.h > >>>> @@ -2840,6 +2840,11 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj, > >>>> int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level, > >>>> u32 flags); > >>>> int __must_check i915_vma_unbind(struct i915_vma *vma); > >>>> +/* > >>>> + * BEWARE: Do not use the function below unless you can _absolutely_ > >>>> + * _guarantee_ VMA in question is _not in use_ anywhere. > >>>> + */ > >>>> +int __must_check __i915_vma_unbind_no_wait(struct i915_vma *vma); > >>>> int i915_gem_object_put_pages(struct drm_i915_gem_object *obj); > >>>> void i915_gem_release_all_mmaps(struct drm_i915_private *dev_priv); > >>>> void i915_gem_release_mmap(struct drm_i915_gem_object *obj); > >>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > >>>> index f0cfbb9ee12c..52642aff1dab 100644 > >>>> --- a/drivers/gpu/drm/i915/i915_gem.c > >>>> +++ b/drivers/gpu/drm/i915/i915_gem.c > >>>> @@ -3208,7 +3208,7 @@ static void i915_gem_object_finish_gtt(struct drm_i915_gem_object *obj) > >>>> old_write_domain); > >>>> } > >>>> > >>>> -int i915_vma_unbind(struct i915_vma *vma) > >>>> +static int __i915_vma_unbind(struct i915_vma *vma, bool wait) > >>>> { > >>>> struct drm_i915_gem_object *obj = vma->obj; > >>>> struct drm_i915_private *dev_priv = obj->base.dev->dev_private; > >>>> @@ -3227,9 +3227,11 @@ int i915_vma_unbind(struct i915_vma *vma) > >>>> > >>>> BUG_ON(obj->pages == NULL); > >>>> > >>>> - ret = i915_gem_object_wait_rendering(obj, false); > >>>> - if (ret) > >>>> - return ret; > >>>> + if (wait) { > >>>> + ret = i915_gem_object_wait_rendering(obj, false); > >>>> + if (ret) > >>>> + return ret; > >>>> + } > >>>> > >>>> if (i915_is_ggtt(vma->vm) && > >>>> vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL) { > >>>> @@ -3274,6 +3276,16 @@ int i915_vma_unbind(struct i915_vma *vma) > >>>> return 0; > >>>> } > >>>> > >>>> +int i915_vma_unbind(struct i915_vma *vma) > >>>> +{ > >>>> + return __i915_vma_unbind(vma, true); > >>>> +} > >>>> + > >>>> +int __i915_vma_unbind_no_wait(struct i915_vma *vma) > >>>> +{ > >>>> + return __i915_vma_unbind(vma, false); > >>>> +} > >>>> + > >>>> int i915_gpu_idle(struct drm_device *dev) > >>>> { > >>>> struct drm_i915_private *dev_priv = dev->dev_private; > >>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > >>>> index 74aa0c9929ba..680b4c9f6b73 100644 > >>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c > >>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c > >>>> @@ -133,6 +133,23 @@ static int get_context_size(struct drm_device *dev) > >>>> return ret; > >>>> } > >>>> > >>>> +static void i915_gem_context_clean(struct intel_context *ctx) > >>>> +{ > >>>> + struct i915_hw_ppgtt *ppgtt = ctx->ppgtt; > >>>> + struct i915_vma *vma, *next; > >>>> + > >>>> + if (WARN_ON_ONCE(!ppgtt)) > >>>> + return; > >>> > >>> [ 80.242892] [drm:i915_gem_open] > >>> [ 80.250485] ------------[ cut here ]------------ > >>> [ 80.258938] WARNING: CPU: 1 PID: 277 at ../drivers/gpu/drm/i915/i915_gem_context.c:141 i915_gem_context_free+0xef/0x27b [i915]() > >>> [ 80.275344] WARN_ON_ONCE(!ppgtt) > >> > >> Ha, I did not see how ctx->ppgtt == NULL could exist but wanted to be > >> extra safe since i915_ppgtt_put checks for that. > > > > Well, how can it exit? > > What? :) Lost one too many 's's there. Anyway, I guess enable_execlists=0 is the magic ingredient here. -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx