On Fri, Oct 4, 2019 at 3:40 PM Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > Replace the struct_mutex requirement for pinning the i915_vma with the > local vm->mutex instead. Note that the vm->mutex is tainted by the > shrinker (we require unbinding from inside fs-reclaim) and so we cannot > allocate while holding that mutex. Instead we have to preallocate > workers to do allocate and apply the PTE updates after we have we > reserved their slot in the drm_mm (using fences to order the PTE writes > with the GPU work and with later unbind). > > In adding the asynchronous vma binding, one subtle requirement is to > avoid coupling the binding fence into the backing object->resv. That is > the asynchronous binding only applies to the vma timeline itself and not > to the pages as that is a more global timeline (the binding of one vma > does not need to be ordered with another vma, nor does the implicit GEM > fencing depend on a vma, only on writes to the backing store). Keeping > the vma binding distinct from the backing store timelines is verified by > a number of async gem_exec_fence and gem_exec_schedule tests. The way we > do this is quite simple, we keep the fence for the vma binding separate > and only wait on it as required, and never add it to the obj->resv > itself. > > Another consequence in reducing the locking around the vma is the > destruction of the vma is no longer globally serialised by struct_mutex. > A natural solution would be to add a kref to i915_vma, but that requires > decoupling the reference cycles, possibly by introducing a new > i915_mm_pages object that is own by both obj->mm and vma->pages. > However, we have not taken that route due to the overshadowing lmem/ttm > discussions, and instead play a series of complicated games with > trylocks to (hopefully) ensure that only one destruction path is called! > > v2: Add some commentary, and some helpers to reduce patch churn. > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> Hi Chris & Tvrtko Cut most of the patch, just a detail question on this bit in the fault handler: > static unsigned int > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c > index dd0c2840ba4d..c19431d609fc 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c > @@ -249,16 +249,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) > if (ret) > goto err_rpm; > > - ret = i915_mutex_lock_interruptible(dev); > - if (ret) > - goto err_reset; Doesn't this open use to possibly spurious failures on old crap where ggtt is super limited? We "leak" the pin count now outside of the mutex that protects it, which means other threads won't be blocked by said mutex, and hence could now see a lot of temporarily pinned vma that previously where always hidden by the big lock and therefore never visible outside of the thread that did the temporary pin. Kinda like the execbuf issue. Note I mean the temporary vma pin here, not the get_pages pin. That one is imo ok since memory is assumed to be plentiful and management through watermarks Good Enough (probably isn't, but we can't easily fix core mm). Or I'm I totally missing something? Admittedly it's early and I definitely haven't acquired the full new locking model into my brain ... Cheers, Daniel > - > - /* Access to snoopable pages through the GTT is incoherent. */ > - if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) { > - ret = -EFAULT; > - goto err_unlock; > - } -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx