Re: [PATCH 0/2] drm/amdgpu/display: Make multi-plane configurations more flexible

Leo Li <sunpeng.li@xxxxxxx> · Wed, 17 Apr 2024 14:51:59 -0400

On 2024-04-16 10:10, Harry Wentland wrote:

On 2024-04-16 04:01, Pekka Paalanen wrote:
On Mon, 15 Apr 2024 18:33:39 -0400
Leo Li <sunpeng.li@xxxxxxx> wrote:

On 2024-04-15 04:19, Pekka Paalanen wrote:
On Fri, 12 Apr 2024 16:14:28 -0400
Leo Li <sunpeng.li@xxxxxxx> wrote:

On 2024-04-12 11:31, Alex Deucher wrote:
On Fri, Apr 12, 2024 at 11:08 AM Pekka Paalanen
<pekka.paalanen@xxxxxxxxxxxxx> wrote:

On Fri, 12 Apr 2024 10:28:52 -0400
Leo Li <sunpeng.li@xxxxxxx> wrote:

On 2024-04-12 04:03, Pekka Paalanen wrote:
On Thu, 11 Apr 2024 16:33:57 -0400
Leo Li <sunpeng.li@xxxxxxx> wrote:

...

That begs the question of what can be nailed down and what can left to
independent implementation. I guess things like which plane should be enabled
first (PRIMARY), and how zpos should be interpreted (overlay, underlay, mixed)
can be defined. How to handle atomic test failures could be as well.

What room is there for the interpretation of zpos values?

I thought they are unambiguous already: only the relative numerical
order matters, and that uniquely defines the KMS plane ordering.

The zpos value of the PRIMARY plane relative to OVERLAYS, for example, as a way
for vendors to communicate overlay, underlay, or mixed-arrangement support. I
don't think allowing OVERLAYs to be placed under the PRIMARY is currently
documented as a way to support underlay.

I always thought it's obvious that the zpos numbers dictate the plane
order without any other rules. After all, we have the universal planes
concept, where the plane type is only informational to aid heuristics
rather than defining anything.

Only if the zpos property does not exist, the plane types would come
into play.

Of course, if there actually exists userspace that fails if zpos allows
an overlay type plane to be placed below primary, or fails if primary
zpos is not zero, then DRM needs a new client cap.

Right, it wasn't immediately clear to me that the API allowed placement of
things beneath the PRIMARY. But reading the docs for drm_plane_create_zpos*,
there's nothing that forbids it.

libliftoff for example, assumes that the PRIMARY has the lowest zpos. So
underlay arrangements will use an OVERLAY for the scanout plane, and the PRIMARY
for the underlay view.

That's totally ok. It works, right? Plane type does not matter if the
KMS driver accepts the configuration.

What is a "scanout plane"? Aren't all KMS planes by definition scanout
planes?

Pardon my terminology, I thought the scanout plane was where weston rendered
non-offloadable surfaces to. I guess it's more correct to call it the "render
plane". On weston, it seems to be always assigned to the PRIMARY.

The assignment restriction is just technical design debt. It is
limiting. There is no other good reason for it, than when lighting
up a CRTC for the first time, Weston should do it with the renderer FB
only, on the plane that is most likely to succeed i.e. PRIMARY. After
the CRTC is lit, there should be no built-in limitations in what can go
where.

The reason for this is that if a CRTC can be activated, it must always
be able to show the renderer FB without incurring a modeset. This is
important for ensuring that the fallback compositing (renderer) is
always possible. So we start with that configuration, and everything
else is optional bonus.

Genuinely curious - What exactly is limiting with keeping the renderer FB on
PRIMARY? IOW, what is the additional benefit of placing the renderer FB on
something other than PRIMARY?

The limitations come from a combination of hardware limitations.
Perhaps zpos is not mutable, or maybe other planes cannot arbitrarily
move between above and below the primary. This reduces the number of
possible configurations, which might cause off-loading to fail.

I think older hardware has more of these arbitrary restrictions.

I see. I was thinking that drivers can do under-the-hood stuff to present a
mutable zpos to clients, even if their hardware planes cannot be arbitrarily
rearranged, by mapping the PRIMARY to a different hardware plane. But not all
planes have the same function, so this sounds more complicated than helpful.

For libliftoff, using OVERLAYs as the render plane and PRIMARY as the underlay
plane would work. But I think keeping the render plane on PRIMARY (a la weston)
makes underlay arrangements easier to allocate, and would be nice to incorporate
into a shared algorithm.

If zpos exists, I don't think such limitation is a good idea. It will
just limit the possible configurations for no reason.

With zpos, the KMS plane type should be irrelevant for their
z-ordering. Underlay vs. overlay completely loses its meaning at the
KMS level.

Right, the plane types loose their meanings. But at least with the way
libliftoff builds the plane arrangement, where we first allocate the renderer fb
matters.

libliftoff incrementally builds the atomic state by adding a single plane to the
atomic state, then testing it. It essentially does a depth-first-search of all
possible arrangements, pruning the search on atomic test fail. The state that
offloads the most number of FBs will be the arrangement used.

Of course, it's unlikely that the entire DFS tree will traversed in time for a
frame. So the key is to search the most probable and high-benefit branches
first, while minimizing the # of atomic tests needed, before a hard-coded
deadline is hit.

Following this algorithm, the PRIMARY needs to be enabled first, followed by all
the secondary planes. After a plane is enabled, it's not preferred to change
it's assigned FB, since that can cause the state to be rejected (in actuality,
not just the FB, but also any color and transformation stuffs associated with
the surface). It is preferable to build on the state by enabling another
fb->plane. This is where changing a plane's zpos to be above/below the PRIMARY
is advantageous, rather than changing the FBs assigned, to accommodate
overlay/underlay arrangements.

This all sounds reasonable, but why limit this to only the renderer FB
on primary plane? The same idea should apply equally to any FB on any
plane. Then one needs more heuristics on when to stop the search short,
and when to reconsider each FB-plane assignment in case new candidates
have appeared but the old ones have not disappeared.

libliftoff starts the search by assigning the renderer FB, if one is provided by
the compositor, to PRIMARY. I think the reason is to always have the renderer
option available for FBs that need it. Eventually, if the search tree is
traversed enough, an arrangement that does not need the renderer fb may be
found, if all the FBs can be assigned, and there are enough planes for them. But
we may not get there before the deadline.

Perhaps having more time to search is the solution here.

(p.s. if a candidate FB is added or removed, libliftoff starts the search anew)

I imagine that any algorithm which incrementally builds up the plane arrangement
will have a similar preference. Of course, it's entirely possible that such an
algorithm isn't the best, I admittedly have not thought much about other
possibilities, yet...

It's a complicated problem, indeed. Maybe there needs to be a background
task that is not limited by the page flip deadline and can do an
exhaustive search over many refresh periods.

That would be nice. Kick this off when there is a configuration change,
e.g., user starts video playback, opens a new video, etc.

One would need to avoid doing too much of that, though, as one could
envision scenarios where this happens frequently and could have its
own impact on power by keeping the CPU busy.

Harry

I recall emersion had a similar suggestion for libliftoff by caching the
incomplete plane arrangement for further processing on future frames once the
deadline is reached. It avoids the need for a separate task.

Having more time to do a more exhaustive search would make zpos meaningless
outside of determining the correct z-ordering, as pq previously mentioned. It
would support hardware that have zpos limitations. It is more complex, but maybe
that's fine, as long as the complexity doesn't bleed into other parts of the
compositor.

There are still ways to limit the # of atomic tests needed for the search, which
will help speed things up (already considered by libliftoff today):

* IN_FORMAT property for what FB formats a plane supports
* zpos property for correct z-ordering
* Occlusion rules. A FB occluded by a rendered FB or underlay-ed FB cannot be
overlay-ed, for example
* And potentially more

Thanks,
Leo

Thanks,
pq