Quoting Daniel Vetter (2018-07-11 10:36:36) > On Wed, Jul 11, 2018 at 11:33:26AM +0200, Daniel Vetter wrote: > > On Wed, Jul 11, 2018 at 08:36:02AM +0100, Chris Wilson wrote: > > > Add a mutex into struct i915_address_space to be used while operating on > > > the vma and their lists for a particular vm. As this may be called from > > > the shrinker, we taint the mutex with fs_reclaim so that from the start > > > lockdep warns us if we are caught holding the mutex across an > > > allocation. (With such small steps we will eventually rid ourselves of > > > struct_mutex recursion!) > > > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > > Not sure it exists in a branch of yours already, but here's my thoughts on > > extending this to the address_space lrus and the shrinker callback (which > > I think would be the next step with good pay-off): > > > > 1. make sure pin_count is protected by reservation_obj. It's vma->pin_count, so I guarded it with vm->mutex (along with the drm_mm and vm->*list, with some vigorous pruning of lists to reduce the locking surface). There's also the obj->vma_list and trees that are moved to a obj->vma_lock. > > 2. grab the vm.mutex when walking LRUs everywhere. This is going to be > > tricky for ggtt because of runtime PM. Between lock-dropping, carefully > > avoiding rpm when cleaning up objects and just grabbing an rpm wakeref > > when walking the ggtt vm this should be possible to work around (since for > > the fences we clearly need to be able to nest the vm.mutex within rpm or > > we're busted). > > 3. In the shrinker trylock the reservation_obj and treat a failure to get > > the lock as if pin_count is elevated. If we can't shrink enough then grab > > a temporary reference to the bo using kref_get_unless_zero, drop the > > vm.mutex (since that's what gave us the weak ref) and do a blocking > > reservation_obj lock. > > Ok this doesn't work, because reservation_obj needs to allow allocations. > But compared to our current lock stealing trickery the above scheme > reduces possibilities to shrink, or at least rate-limit command submission > somewhat. Not sure how to best tackle that. Right, the only way is to avoid us using reservation_obj->lock inside the shrinker, which is quite easy (for the moment at least, I've haven't completed the i915_request eradication of struct_mutex for ordering which is likely to require more reservation_obj or similar). I do have a branch that gets us to no struct_mutex inside intel_display and to remove struct_mutex from the shrinker within 10 more ugly patches that pass CI, which is a start. Something that I want to drip feed slowly so that we can catch the high MTBF regressions that are inevitable. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx