Re: [PATCH 11/42] drm/i915: Introduce an internal allocator for disposable private objects

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Sat, 8 Oct 2016 09:12:13 +0100

On 07/10/2016 18:08, Chris Wilson wrote:
On Fri, Oct 07, 2016 at 05:52:47PM +0100, Tvrtko Ursulin wrote:
On 07/10/2016 10:46, Chris Wilson wrote:
Quite a few of our objects used for internal hardware programming do not
benefit from being swappable or from being zero initialised. As such
they do not benefit from using a shmemfs backing storage and since they
are internal and never directly exposed to the user, we do not need to
worry about providing a filp. For these we can use an
drm_i915_gem_object wrapper around a sg_table of plain struct page. They
are not swap backed and not automatically pinned. If they are reaped
by the shrinker, the pages are released and the contents discarded. For
the internal use case, this is fine as for example, ringbuffers are
pinned from being written by a request to be read by the hardware. Once
they are idle, they can be discarded entirely. As such they are a good
match for execlist ringbuffers and a small variety of other internal
objects.

In the first iteration, this is limited to the scratch batch buffers we
use (for command parsing and state initialisation).

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
  drivers/gpu/drm/i915/Makefile                |   1 +
  drivers/gpu/drm/i915/i915_drv.h              |   5 +
  drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  28 ++---
  drivers/gpu/drm/i915/i915_gem_internal.c     | 161 +++++++++++++++++++++++++++
  drivers/gpu/drm/i915/i915_gem_render_state.c |   2 +-
  drivers/gpu/drm/i915/intel_engine_cs.c       |   2 +-
  drivers/gpu/drm/i915/intel_ringbuffer.c      |  14 ++-
  7 files changed, 189 insertions(+), 24 deletions(-)
  create mode 100644 drivers/gpu/drm/i915/i915_gem_internal.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index a998c2bce70a..b94a90f34d2d 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -35,6 +35,7 @@ i915-y += i915_cmd_parser.o \
  	  i915_gem_execbuffer.o \
  	  i915_gem_fence.o \
  	  i915_gem_gtt.o \
+	  i915_gem_internal.o \
  	  i915_gem.o \
  	  i915_gem_render_state.o \
  	  i915_gem_request.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fee5cc92e2f2..bad97f1e5265 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3538,6 +3538,11 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
  					       u32 gtt_offset,
  					       u32 size);
+/* i915_gem_internal.c */
+struct drm_i915_gem_object *
+i915_gem_object_create_internal(struct drm_device *dev,
+				unsigned int size);
+
Wasn't size_t our convention for GEM objects?
Not our convention, no. Using size_t has caused too many bugs, that we
started to erradicate it (in our usual piecemeal approach).

Oh wow, I missed that decision. Last thing I remember was that someone 
was trying to convert it all to size_t. Fine by me.

+		/* 965gm cannot relocate objects above 4GiB. */
+		gfp &= ~__GFP_HIGHMEM;
+		gfp |= __GFP_DMA32;
+	}
+
+	for (i = 0; i < npages; i++) {
+		struct page *page;
+
+		page = alloc_page(gfp);
+		if (!page)
+			goto err;
+
+#ifdef CONFIG_SWIOTLB
+		if (swiotlb_nr_tbl()) {
+			st->nents++;
+			sg_set_page(sg, page, PAGE_SIZE, 0);
+			sg = sg_next(sg);
+			continue;
+		}
+#endif
+		if (!i || page_to_pfn(page) != last_pfn + 1) {
+			if (i)
+				sg = sg_next(sg);
+			st->nents++;
+			sg_set_page(sg, page, PAGE_SIZE, 0);
+		} else {
+			sg->length += PAGE_SIZE;
+		}
+		last_pfn = page_to_pfn(page);
+	}
+#ifdef CONFIG_SWIOTLB
+	if (!swiotlb_nr_tbl())
+#endif
+		sg_mark_end(sg);
Looks like the loop above could be moved into a helper and shared
with i915_gem_object_get_pages_gtt. Maybe just a page-alloc and
page-alloc-error callbacks would be required.
So just the entire thing as a callback... I would have thought you might
suggest trying high order allocations and falling back to low order.

I am not sure higher order attempts would be worth it. What I was 
thinking was to implement the loop above generically in one place, and 
then two (or even three with userptr) callers would call that with their 
own alloc-page and alloc-page-on-error callbacks. So that there is a 
single implementation of the coalescing logic etc.

Regards,

Tvrtko



_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx