Re: [RFC 3/4] drm/i915/gen9: Expose top-most universal plane as cursor

Matt Roper <matthew.d.roper@xxxxxxxxx> · Thu, 27 Oct 2016 14:11:36 -0700

On Thu, Oct 27, 2016 at 06:03:54PM -0200, Paulo Zanoni wrote:
> Em Qua, 2016-10-26 às 15:51 -0700, Matt Roper escreveu:
> > Gen9 has a traditional cursor plane that is mutually exclusive with
> > the
> > system's top-most "universal" plane; it seems likely that two planes
> > are
> > really a single shared hardware unit with two different register
> > interfaces.  Thus far i915 has exposed a cursor plane to userspace
> > that's hooked up to the old-style cursor registers; we just pretended
> > that the top-most universal plane didn't exist and reported one fewer
> > "sprite/overlay" planes for each pipe than the platform technically
> > has.
> > Let's switch this around so that the cursor exposed to userspace is
> > actually wired up to top-most universal plane registers.  We'll
> > continue
> > to present the same cursor ABI to userspace that we always have, but
> > internally we'll just be programming a different set of registers and
> > doing so in a way that's more consistent with how all the other gen9
> > planes work --- less cursor-specific special cases throughout the
> > code.
> > 
> > Aside from making the code a bit simpler (fewer cursor-specific
> > special
> > cases), this will also pave the way to eventually exposing the top-
> > most
> > plane in a more full-featured manner to userspace clients that don't
> > need a traditional cursor and wish to opt into having access to an
> > additional sprite/overlay-style plane instead.
> > 
> > It's worth noting that a lot of the special-cased cursor-specific
> > code
> > was in the gen9 watermark programming.  It's good to simplify that
> > code,
> > but we should keep an eye out for any unwanted side effects of this
> > patch; since sprites/overlays tend to get used less than cursors,
> > it's
> > possible that this could help us uncover additional underruns that
> > nobody had run across yet.  Or it could have the opposite effect and
> > eliminate some of the underruns that we haven't been able to get rid
> > of
> > yet.
> > 
> > Cc: Bob Paauwe <bob.j.paauwe@xxxxxxxxx>
> > Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
> > Cc: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx>
> > Cc: Lyude Paul <lyude@xxxxxxxxxx>
> > Signed-off-by: Matt Roper <matthew.d.roper@xxxxxxxxx>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c      |  4 --
> >  drivers/gpu/drm/i915/i915_drv.h          | 11 +++-
> >  drivers/gpu/drm/i915/intel_device_info.c | 38 +++++++++----
> >  drivers/gpu/drm/i915/intel_display.c     | 97 ++++++++++++--------
> > ------------
> >  drivers/gpu/drm/i915/intel_drv.h         |  7 +++
> >  drivers/gpu/drm/i915/intel_pm.c          | 85 ++++----------------
> > --------
> >  drivers/gpu/drm/i915/intel_sprite.c      |  6 +-
> >  7 files changed, 93 insertions(+), 155 deletions(-)
> > 
...
> >  
> > -	cursor_blocks = skl_cursor_allocation(num_active);
> > -	ddb->plane[pipe][PLANE_CURSOR].start = alloc->end -
> > cursor_blocks;
> > -	ddb->plane[pipe][PLANE_CURSOR].end = alloc->end;
> > -
> > -	alloc_size -= cursor_blocks;
> 
> So this means that now in step 2 we're going to try to increase the
> cursor DDBs beyond the current fixed sizes of 32 or 8? Aren't we going
> to leave the cursor DDB way-too-big and taking space of the other DDBs
> when it's just being used as a cursor?

You're right that we're not using the fixed size allocation anymore.
However the extra blocks that we give the pseudo-cursor should still be
proportional to the size/data rate of the plane.  So if you're using a
64x64 "cursor," that's going to be very small compared to your other
planes (the primary plane is often fullscreen and other universal planes
are a good portion of full size).  In practice it seems to work out that
the extra blocks calculation winds up rounding down to zero or near zero
since the "cursor" is so dwarfed by the primary plane --- on my
1920x1200 display it works out as:

        Primary data rate:  8294400
        Cursor data rate:     16384
        Total data rate:    8310784
        Remaining DDB blocks:   492

        Primary extra blocks = 8294400 * 492 / 8310784
                             = (int)491.03006
                             = 491
        Cursor extra blocks = 16384 * 492 / 8310784
                             = (int)0.969935
                             = 0

Due to rounding we don't actually add any cursor blocks at all.  In fact
we're using _fewer_ blocks for cursor than the 32 fixed allocation we
were previously using for single display setups.

> Also, we're not going to leave the cursor DDB at the end of the buffer,
> which may cause some extra DDB reallocations that we may not want.
> Also, whenever someone disables the cursor we'll do a DDB reallocation,
> and as far as I understand we really don't want this specific case
> since it's so common.

True, but the reallocation in this case is an intra-pipe reallocation
rather than the horrific "lock the whole world" inter-pipe reallocation.
It means we crunch a few more numbers during atomic check and write a
few registers we might have skipped during atomic commit, but there's no
extra locking or major new work happening, so it doesn't seem like it
really adds too much overhead, even if userspace is trying to turn the
cursor on and off nonstop like some DDX's do.

> And there's a sill-not-implemented workaround for watermarks
> programming related to the memory bandwidth and it also special-cases
> the cursor. Is there a way to know when the cursor is being used as a
> cursor or when it's being used as a normal plane? Maybe just check it
> by a size threshold?

Well in this case we're effectively not using the cursor anymore from
the hardware's point of view; it's disabled.  You're just using the
"primary" plane and at least one additional universal (aka "sprite")
plane, which is already a valid display configuration that we encounter
frequently even without this patch.  So the workarounds should already
cover that case properly.

Basically we don't need to use the hardware's idea of what a cursor is
in order to implement userspace's idea of what the cursor is.

> My anxiety levels always go up whenever I see someone changing the code
> in a way that makes it not match with what's written in the spec
> anymore. Maybe we'll be fine with the changes, but we need to carefully
> analyze the impact here, especially since watermarks are still so
> fragile. Maybe we should just ping the spec authors and make them
> rewrite the specs in a way that considers the cases where the cursor
> plane is being used as a generic plane?

I think I may not have done a very good job of explaining what's
happening here --- the code in this patch should still fully match the
spec.  The spec covers how to use all of the planes on the system and
you're free to turn on or turn off any combination of them you want.
The exception is that the topmost universal plane and the cursor can't
be used at the same time.  Previously that meant we just never let the
topmost plane get turned on at all (to ensure no conflict), but the spec
still describes how the top plane should be programmed and accounted for
in workarounds (i.e., the exact same way as the other universal planes);
the changes here should all adhere to that.  Now we're flipping around
how we ensure mututal exclusion to be "we'll never turn on the cursor
plane" instead, which is perfectly legal.

The hardware level details here are independent of the plane list we
expose to userspace.  For ABI purposes we always need *something* that
is identified as a "cursor" and is hooked up to the legacy cursor
ioctls.  There's nothing that says we actually have to implement that
userspace interface using our hardware's dedicated cursor plane ---
we're free to re-use any other plane our hardware happens to have as
well.  That's what I'm doing here...the topmost universal/sprite plane
that was previously never used at all is now reserved for servicing
requests against the plane identified as DRM_PLANE_TYPE_CURSOR.
Userspace never notices a difference as long as the universal plane gets
the cursor content onto the screen the same way as the cursor plane.

I think this is actually pretty similar to what happens a lot in the ARM
world.  My understanding is that most of the ARM guys don't even have a
true "cursor" plane, but rather just a bunch of general purpose planes
that are more similar to what we call universal planes.  They'll
typically reserve one of those planes for cursor ABI compatibility.

Hopefully that's more clear...if not let me know.

Matt

> 
> > -
> >  	/* 1. Allocate the mininum required blocks for each active
> > plane */
> >  	for_each_plane_in_state(state, plane, pstate, i) {
> >  		intel_plane = to_intel_plane(plane);
> > @@ -3431,17 +3408,12 @@ skl_allocate_pipe_ddb(struct intel_crtc_state
> > *cstate,
> >  			y_minimum[id] = 0;
> >  			continue;
> >  		}
> > -		if (plane->type == DRM_PLANE_TYPE_CURSOR) {
> > -			minimum[id] = 0;
> > -			y_minimum[id] = 0;
> > -			continue;
> > -		}
> >  
> >  		minimum[id] = skl_ddb_min_alloc(pstate, 0);
> >  		y_minimum[id] = skl_ddb_min_alloc(pstate, 1);
> >  	}
> >  
> > -	for (i = 0; i < PLANE_CURSOR; i++) {
> > +	for (i = 0; i < I915_MAX_PLANES; i++) {
> >  		alloc_size -= minimum[i];
> >  		alloc_size -= y_minimum[i];
> >  	}
> > @@ -3866,26 +3838,6 @@ void skl_write_plane_wm(struct intel_crtc
> > *intel_crtc,
> >  			    &ddb->y_plane[pipe][plane]);
> >  }
> >  
> > -void skl_write_cursor_wm(struct intel_crtc *intel_crtc,
> > -			 const struct skl_plane_wm *wm,
> > -			 const struct skl_ddb_allocation *ddb)
> > -{
> > -	struct drm_crtc *crtc = &intel_crtc->base;
> > -	struct drm_device *dev = crtc->dev;
> > -	struct drm_i915_private *dev_priv = to_i915(dev);
> > -	int level, max_level = ilk_wm_max_level(dev_priv);
> > -	enum pipe pipe = intel_crtc->pipe;
> > -
> > -	for (level = 0; level <= max_level; level++) {
> > -		skl_write_wm_level(dev_priv, CUR_WM(pipe, level),
> > -				   &wm->wm[level]);
> > -	}
> > -	skl_write_wm_level(dev_priv, CUR_WM_TRANS(pipe), &wm-
> > >trans_wm);
> > -
> > -	skl_ddb_entry_write(dev_priv, CUR_BUF_CFG(pipe),
> > -			    &ddb->plane[pipe][PLANE_CURSOR]);
> > -}
> > -
> >  bool skl_wm_level_equals(const struct skl_wm_level *l1,
> >  			 const struct skl_wm_level *l2)
> >  {
> > @@ -4122,19 +4074,11 @@ skl_print_wm_changes(const struct
> > drm_atomic_state *state)
> >  			if (skl_ddb_entry_equal(old, new))
> >  				continue;
> >  
> > -			if (id != PLANE_CURSOR) {
> > -				DRM_DEBUG_ATOMIC("[PLANE:%d:plane
> > %d%c] ddb (%d - %d) -> (%d - %d)\n",
> > -						 plane->base.id, id
> > + 1,
> > -						 pipe_name(pipe),
> > -						 old->start, old-
> > >end,
> > -						 new->start, new-
> > >end);
> > -			} else {
> > -				DRM_DEBUG_ATOMIC("[PLANE:%d:cursor
> > %c] ddb (%d - %d) -> (%d - %d)\n",
> > -						 plane->base.id,
> > -						 pipe_name(pipe),
> > -						 old->start, old-
> > >end,
> > -						 new->start, new-
> > >end);
> > -			}
> > +			DRM_DEBUG_ATOMIC("[PLANE:%d:plane %d%c] ddb
> > (%d - %d) -> (%d - %d)\n",
> > +					 plane->base.id, id + 1,
> > +					 pipe_name(pipe),
> > +					 old->start, old->end,
> > +					 new->start, new->end);
> >  		}
> >  	}
> >  }
> > @@ -4235,9 +4179,6 @@ static void skl_update_wm(struct drm_crtc
> > *crtc)
> >  		for_each_universal_plane(dev_priv, pipe, plane)
> >  			skl_write_plane_wm(intel_crtc, &pipe_wm-
> > >planes[plane],
> >  					   &results->ddb, plane);
> > -
> > -		skl_write_cursor_wm(intel_crtc, &pipe_wm-
> > >planes[PLANE_CURSOR],
> > -				    &results->ddb);
> >  	}
> >  
> >  	skl_copy_wm_for_pipe(hw_vals, results, pipe);
> > @@ -4350,18 +4291,12 @@ void skl_pipe_wm_get_hw_state(struct drm_crtc
> > *crtc,
> >  		wm = &out->planes[id];
> >  
> >  		for (level = 0; level <= max_level; level++) {
> > -			if (id != PLANE_CURSOR)
> > -				val = I915_READ(PLANE_WM(pipe, id,
> > level));
> > -			else
> > -				val = I915_READ(CUR_WM(pipe,
> > level));
> > +			val = I915_READ(PLANE_WM(pipe, id, level));
> >  
> >  			skl_wm_level_from_reg_val(val, &wm-
> > >wm[level]);
> >  		}
> >  
> > -		if (id != PLANE_CURSOR)
> > -			val = I915_READ(PLANE_WM_TRANS(pipe, id));
> > -		else
> > -			val = I915_READ(CUR_WM_TRANS(pipe));
> > +		val = I915_READ(PLANE_WM_TRANS(pipe, id));
> >  
> >  		skl_wm_level_from_reg_val(val, &wm->trans_wm);
> >  	}
> > diff --git a/drivers/gpu/drm/i915/intel_sprite.c
> > b/drivers/gpu/drm/i915/intel_sprite.c
> > index 43d0350..9e6406a 100644
> > --- a/drivers/gpu/drm/i915/intel_sprite.c
> > +++ b/drivers/gpu/drm/i915/intel_sprite.c
> > @@ -194,7 +194,7 @@ void intel_pipe_update_end(struct intel_crtc
> > *crtc, struct intel_flip_work *work
> >  	}
> >  }
> >  
> > -static void
> > +void
> >  skl_update_plane(struct drm_plane *drm_plane,
> >  		 const struct intel_crtc_state *crtc_state,
> >  		 const struct intel_plane_state *plane_state)
> > @@ -285,7 +285,7 @@ skl_update_plane(struct drm_plane *drm_plane,
> >  	POSTING_READ(PLANE_SURF(pipe, plane));
> >  }
> >  
> > -static void
> > +void
> >  skl_disable_plane(struct drm_plane *dplane, struct drm_crtc *crtc)
> >  {
> >  	struct drm_device *dev = dplane->dev;
> > @@ -752,7 +752,7 @@ ilk_disable_plane(struct drm_plane *plane, struct
> > drm_crtc *crtc)
> >  	POSTING_READ(DVSSURF(pipe));
> >  }
> >  
> > -static int
> > +int
> >  intel_check_sprite_plane(struct drm_plane *plane,
> >  			 struct intel_crtc_state *crtc_state,
> >  			 struct intel_plane_state *state)

-- 
Matt Roper
Graphics Software Engineer
IoTG Platform Enabling & Development
Intel Corporation
(916) 356-2795
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx