On Tue, Jan 26, 2016 at 04:23:28PM +0000, Tvrtko Ursulin wrote: > > On 26/01/16 15:10, Chris Wilson wrote: > >On Tue, Jan 26, 2016 at 02:53:31PM +0000, Tvrtko Ursulin wrote: > >>From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >> > >>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>--- > >> drivers/gpu/drm/i915/i915_gem.c | 96 ++++++++++++++++++++++++++++++++++++++--- > >> include/uapi/drm/i915_drm.h | 3 ++ > >> 2 files changed, 93 insertions(+), 6 deletions(-) > >> > >>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > >>index dacf6a0013c5..039d55a49fc6 100644 > >>--- a/drivers/gpu/drm/i915/i915_gem.c > >>+++ b/drivers/gpu/drm/i915/i915_gem.c > >>@@ -1954,6 +1954,60 @@ out: > >> return i915_gem_ret_to_vm_ret(dev_priv, ret); > >> } > >> > >>+static int > >>+i915_gem_cpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > >>+{ > >>+ struct drm_i915_gem_object *obj = to_intel_bo(vma->vm_private_data); > >>+ struct drm_device *dev = obj->base.dev; > >>+ struct drm_i915_private *dev_priv = dev->dev_private; > >>+ bool write = !!(vmf->flags & FAULT_FLAG_WRITE); > >>+ pgoff_t page_offset; > >>+ struct page *page; > >>+ int ret; > >>+ > >>+ /* We don't use vmf->pgoff since that has the fake offset */ > >>+ page_offset = ((unsigned long)vmf->virtual_address - vma->vm_start) >> > >>+ PAGE_SHIFT; > >>+ > >>+ trace_i915_gem_object_fault(obj, page_offset, true, write); > >>+ > >>+ intel_runtime_pm_get(dev_priv); > >>+ > >>+ ret = i915_mutex_lock_interruptible(dev); > >>+ if (ret) > >>+ goto out; > >>+ > >>+ ret = i915_gem_object_set_to_cpu_domain(obj, write); > >>+ if (ret) > >>+ goto out_unlock; > > > >That was a mistake in the GTT gem_fault(). If you do this, we also want > >the nonblocking wait for obvious reasons. > > You suggest leaving it for userspace? It is userspace's responsibility. Page faults are random and do not occur around every pointer acceess - userspace has to mark the domain changes on its boundaries (and coordinate amongst its peers). > And how would a non-blocking wait work? Before set-to-cpu-domain, we do a wait_rendering_nonblocking which drops and then reacquires the mutex. (That allows for multiple waiters which tends to be the lowest hanging fruit with struct_mutex contention.) Then set-to-cpu domain does a blocking wait to ensure nothing snuck in. But I don't think we want this. And we can then reduce the i915_mutex_lock_interruptible() to a plain mutex_lock_interruptible() as we are not touching the GPU. > >>+ ret = i915_gem_object_get_pages(obj); > >>+ if (ret) > >>+ goto out_unlock; > >>+ > >>+ page = i915_gem_object_get_page(obj, page_offset); > >>+ if (!page) { > >>+ ret = -ERANGE; ret = -EFAULT; though it would definitely be a stack bug. > >>+ goto out_unlock; > >>+ } > >>+ > >>+ mutex_unlock(&dev->struct_mutex); > >>+ > >>+ ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address, > >>+ page_to_pfn(page)); > > > >We don't have a page ref at this point, so this obj+page could be > >freed (via the shrinker at least) before we insert it. > > Oh yeah, need to pin the pages.. But only whilst inserting. Once inserted they need to be evicted, and I was wondering whether we should do the zap on put_pages(). If we don't, it means that the shrinker is neutered. > >I would also be more interested in having a version that faulted the > >entire object at once - though maybe we will see more random access in > >future. > > Yeah I did not want to concern myself with more code since this was > a proof of concept anyway. No worries, the transformation is simple with a certain remap function. > >It looks fairly sane. I wanted this just a short while ago, but figured > >out a way of using regular mmap() to give me the inheritance instead. > > So would it be useful to cleanup and finish this work or not? I agree that it closes a big hole in the API - the ability to CPU mmap non-shmemfs object (i.e. userptr, dmabuf). With a bit of polish we should be able to offer something to take advantage of the existing GEM infrastructure better than a regular CPU mmapping - though off the top of my head, I don't have anything that is ratelimited by CPU pagefaults. Another thing I realised was that this severely limits the mmap space on 32-bit systems, as the vma manager is unsigned long. The CPU mmaping was a way around some of the restrictions. That would seem fairly easy to lift (and I hope without consequence). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx