On 1/15/2019 12:17, Chris Wilson wrote:
Quoting John Harrison (2019-01-15 19:57:19)
On 1/7/2019 03:55, Chris Wilson wrote:
Previously we only accommodated having a vma pinned by a small number of
users, with the maximum being pinned for use by the display engine. As
such, we used a small bitfield only large enough to allow the vma to
be pinned twice (for back/front buffers) in each scanout plane. Keeping
the maximum permissible pin_count small allows us to quickly catch a
potential leak. However, as we want to split a 4096B page into 64
different cachelines and pin each cacheline for use by a different
timeline, we will exceed the current maximum permissible vma->pin_count
and so time has come to enlarge it.
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
drivers/gpu/drm/i915/i915_gem_gtt.h | 26 +++++++++++++-------------
drivers/gpu/drm/i915/i915_vma.h | 28 +++++++++-------------------
2 files changed, 22 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index bd679c8c56dd..03ade71b8d9a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -642,19 +642,19 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
/* Flags used by pin/bind&friends. */
#define PIN_NONBLOCK BIT_ULL(0)
-#define PIN_MAPPABLE BIT_ULL(1)
-#define PIN_ZONE_4G BIT_ULL(2)
-#define PIN_NONFAULT BIT_ULL(3)
-#define PIN_NOEVICT BIT_ULL(4)
-
-#define PIN_MBZ BIT_ULL(5) /* I915_VMA_PIN_OVERFLOW */
-#define PIN_GLOBAL BIT_ULL(6) /* I915_VMA_GLOBAL_BIND */
-#define PIN_USER BIT_ULL(7) /* I915_VMA_LOCAL_BIND */
-#define PIN_UPDATE BIT_ULL(8)
-
-#define PIN_HIGH BIT_ULL(9)
-#define PIN_OFFSET_BIAS BIT_ULL(10)
-#define PIN_OFFSET_FIXED BIT_ULL(11)
+#define PIN_NONFAULT BIT_ULL(1)
+#define PIN_NOEVICT BIT_ULL(2)
+#define PIN_MAPPABLE BIT_ULL(3)
+#define PIN_ZONE_4G BIT_ULL(4)
+#define PIN_HIGH BIT_ULL(5)
+#define PIN_OFFSET_BIAS BIT_ULL(6)
+#define PIN_OFFSET_FIXED BIT_ULL(7)
+
+#define PIN_MBZ BIT_ULL(8) /* I915_VMA_PIN_OVERFLOW */
+#define PIN_GLOBAL BIT_ULL(9) /* I915_VMA_GLOBAL_BIND */
+#define PIN_USER BIT_ULL(10) /* I915_VMA_LOCAL_BIND */
+#define PIN_UPDATE BIT_ULL(11)
+
The upper bits need moving to accommodate the larger count. And the
HIGH/OFFSET_* fields are not shared with vma-flags so can be moved down
with the other pin only flags. But I don't see a reason to shuffle the
lower bits around? MAPPABLE to NOEVICT were 1,2,3,4 but are now 3,4,1,2.
Is there some semantic meaning to the new order?
Just that:
- bias, mappable, zone_4g: address limit specifiers
+ high: not strictly an address limit, but an address direction to search
+ fixed: address override, limits still apply though
- nonblock, nonfault, noevict: search specifiers
I just hadn't had an excuse to reorder them for a while.
Fair enough. I just wanted to check it was deliberate and not some
accidental remnant of some other change that was dropped along the way.
#define PIN_OFFSET_MASK (-I915_GTT_PAGE_SIZE)
#endif
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 7252abc73d3e..266b226ebef2 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -70,30 +70,20 @@ struct i915_vma {
*/
unsigned int open_count;
unsigned long flags;
- /**
- * How many users have pinned this object in GTT space. The following
- * users can each hold at most one reference: pwrite/pread, execbuffer
- * (objects are not allowed multiple times for the same batchbuffer),
- * and the framebuffer code. When switching/pageflipping, the
- * framebuffer code has at most two buffers pinned per crtc.
- *
- * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3
- * bits with absolutely no headroom. So use 4 bits.
- */
Is it not worth keeping some comment about the maximum pin count being
bounded so 8-bits is guaranteed to be sufficient? Also, is the old
comment actually valid? Surely modern hardware has more than two CRTCs
so the limit of 7 was wrong anyway? Maybe even have a compile time
assert that the mask size is greater than max(1 + 1 + 1 + 1 +
2*MAX_CRTC, PAGE_SIZE/CACHELINE_SIZE)?
Is a comment accurate? rotfl
One can but dream...
Also I think we are up to 3*NUM_PLANES*NUM_CRTCS, but can't be quite sure
with the atomic state tracking, so it might just be still 2 (but just
wait until we have an actual flip queue).
That sounds like we should already be overflowing the current limit of 16?!
I was also wondering if we used 7b + negative byte for the overflow
would generate better code.
Meaning limit the count to 127 and use a signed char cast? Thus test for
-ve rather than BIT(x)? Maybe. Although the above comment makes me
nervous that even 127 might not be sufficient for very much longer. New
hardware always brings more planes and heads!
Still a comment towards that this should be bounded to a small number
hence the "validity" in checking for overflow as part of the flags might
be in order.
-Chris
I think some kind of comment along those lines would be worth having.
With that comment added:
Reviewed-by: John Harrison <John.C.Harrison@xxxxxxxxx>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx