On Wed, Jul 11, 2018 at 10:49:51AM +0100, Chris Wilson wrote: > Quoting Daniel Vetter (2018-07-11 10:36:36) > > On Wed, Jul 11, 2018 at 11:33:26AM +0200, Daniel Vetter wrote: > > > On Wed, Jul 11, 2018 at 08:36:02AM +0100, Chris Wilson wrote: > > > > Add a mutex into struct i915_address_space to be used while operating on > > > > the vma and their lists for a particular vm. As this may be called from > > > > the shrinker, we taint the mutex with fs_reclaim so that from the start > > > > lockdep warns us if we are caught holding the mutex across an > > > > allocation. (With such small steps we will eventually rid ourselves of > > > > struct_mutex recursion!) > > > > > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > > > > Not sure it exists in a branch of yours already, but here's my thoughts on > > > extending this to the address_space lrus and the shrinker callback (which > > > I think would be the next step with good pay-off): > > > > > > 1. make sure pin_count is protected by reservation_obj. > > It's vma->pin_count, so I guarded it with vm->mutex (along with the > drm_mm and vm->*list, with some vigorous pruning of lists to reduce the > locking surface). There's also the obj->vma_list and trees that are > moved to a obj->vma_lock. Hm, Christian König's series to wrap dma_buf_map/unmap in the reservation_obj is why I thought we'd need that. But just for the unmap of foreign objects I guess we can always punt that to some worker, or just don't bother if it's contended. > > > 2. grab the vm.mutex when walking LRUs everywhere. This is going to be > > > tricky for ggtt because of runtime PM. Between lock-dropping, carefully > > > avoiding rpm when cleaning up objects and just grabbing an rpm wakeref > > > when walking the ggtt vm this should be possible to work around (since for > > > the fences we clearly need to be able to nest the vm.mutex within rpm or > > > we're busted). > > > 3. In the shrinker trylock the reservation_obj and treat a failure to get > > > the lock as if pin_count is elevated. If we can't shrink enough then grab > > > a temporary reference to the bo using kref_get_unless_zero, drop the > > > vm.mutex (since that's what gave us the weak ref) and do a blocking > > > reservation_obj lock. > > > > Ok this doesn't work, because reservation_obj needs to allow allocations. > > But compared to our current lock stealing trickery the above scheme > > reduces possibilities to shrink, or at least rate-limit command submission > > somewhat. Not sure how to best tackle that. > > Right, the only way is to avoid us using reservation_obj->lock inside > the shrinker, which is quite easy (for the moment at least, I've haven't > completed the i915_request eradication of struct_mutex for ordering > which is likely to require more reservation_obj or similar). Yeah I thihnk as long as i915 doesn't need it we can get away with trylocks for foreign objects (which I think will need it, sooner or later). > I do have a branch that gets us to no struct_mutex inside intel_display > and to remove struct_mutex from the shrinker within 10 more ugly patches > that pass CI, which is a start. Something that I want to drip feed > slowly so that we can catch the high MTBF regressions that are inevitable. Very much agreed on drip feeding :-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx