On Tue, Oct 06, 2015 at 11:28:49AM +0200, Daniel Vetter wrote: > On Tue, Oct 06, 2015 at 10:04:31AM +0100, Chris Wilson wrote: > > On Tue, Oct 06, 2015 at 10:58:01AM +0200, Daniel Vetter wrote: > > > On Mon, Oct 05, 2015 at 01:26:36PM +0100, Tvrtko Ursulin wrote: > > > > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > > > > > > > Prevent leaking VMAs and PPGTT VMs when objects are imported > > > > via flink. > > > > > > > > Scenario is that any VMAs created by the importer will be left > > > > dangling after the importer exits, or destroys the PPGTT context > > > > with which they are associated. > > > > > > > > This is caused by object destruction not running when the > > > > importer closes the buffer object handle due the reference held > > > > by the exporter. This also leaks the VM since the VMA has a > > > > reference on it. > > > > > > > > In practice these leaks can be observed by stopping and starting > > > > the X server on a kernel with fbcon compiled in. Every time > > > > X server exits another VMA will be leaked against the fbcon's > > > > frame buffer object. > > > > > > > > Also on systems where flink buffer sharing is used extensively, > > > > like Android, this leak has even more serious consequences. > > > > > > > > This version is takes a general approach from the earlier work > > > > by Rafael Barbalho (drm/i915: Clean-up PPGTT on context > > > > destruction) and tries to incorporate the subsequent discussion > > > > between Chris Wilson and Daniel Vetter. > > > > > > > > v2: > > > > > > > > Removed immediate cleanup on object retire - it was causing a > > > > recursive VMA unbind via i915_gem_object_wait_rendering. And > > > > it is in fact not even needed since by definition context > > > > cleanup worker runs only after the last context reference has > > > > been dropped, hence all VMAs against the VM belonging to the > > > > context are already on the inactive list. > > > > > > > > v3: > > > > > > > > Previous version could deadlock since VMA unbind waits on any > > > > rendering on an object to complete. Objects can be busy in a > > > > different VM which would mean that the cleanup loop would do > > > > the wait with the struct mutex held. > > > > > > > > This is an even simpler approach where we just unbind VMAs > > > > without waiting since we know all VMAs belonging to this VM > > > > are idle, and there is nothing in flight, at the point > > > > context destructor runs. > > > > > > > > v4: > > > > > > > > Double underscore prefix for __915_vma_unbind_no_wait and a > > > > commit message typo fix. (Michel Thierry) > > > > > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > > > Testcase: igt/gem_ppgtt.c/flink-and-exit-vma-leak > > > > Reviewed-by: Michel Thierry <michel.thierry@xxxxxxxxx> > > > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > > > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > > Cc: Rafael Barbalho <rafael.barbalho@xxxxxxxxx> > > > > Cc: Michel Thierry <michel.thierry@xxxxxxxxx> > > > > > > Queued for -next, thanks for the patch. > > > > Please no, it's an awful patch and does not even fix the root cause of > > the leak (that the vma are not closed when the handle is). > > It's a lose-lose situation for me as maintainer (and this holds in general > really, not just for this patch): > - Either I wield my considerable maintainer powers and force proper > solution, which will piss of a lot of people. > - Or I just merge intermediate stuff and piss of another set of people > (including likely our all future selves because we're slowly digging a > tech debt grave). > > What I can't do with maintainer fu is force collaboration, I can only try > to piss off everyone equally. And today the die rolled a "merge". Pity it didn't roll "what's the impact and where is the bugzilla?" :) -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx