Re: Questions on page flips and atomic modeset

Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> · Mon, 5 Feb 2018 18:47:36 +0200

On Mon, Feb 05, 2018 at 06:03:25PM +0200, Oleksandr Andrushchenko wrote:
> Hello,
> 
> 
> I have a DRM driver which implements display protocol for Xen [1]
> and this protocol has a dedicated XENDISPL_OP_PG_FLIP request which
> tells the other end that some display buffer needs to be displayed,
> e.g. it is issued by the frontend DRM driver to the corresponding
> backend when the former wants to display a buffer.
> Two notes:
> 1. Communication between two remote parties can obviously fail.
> 2. At the moment only primary plane is supported, so we can think of
> the display buffer as of CRTC's primary fb.
> 
> There are two APIs available for user-space to initiate a page-flip
> (please correct me if I am wrong here): legacy via DRM_IOCTL_MODE_PAGE_FLIP
> and more recent via DRM_IOCTL_MODE_ATOMIC. My current implementation
> (which is 4.9 based) uses drm_crtc_funcs.page_flip callback to send
> XENDISPL_OP_PG_FLIP request to the backend, so if communication fails
> I can return an error code, so the DRM core knows that page flip
> cannot be done.
> 
> But now I am about to enable page flipping via DRM_IOCTL_MODE_ATOMIC for
> which this happens without DRM_IOCTL_MODE_PAGE_FLIP, but via .atomic_check +
> .atomic_flush callbacks.
> 
> The solution I have in mind is that I move the XENDISPL_OP_PG_FLIP request
> to the .atomic_flush callback which seems to be the right place to serve
> both DRM_IOCTL_MODE_PAGE_FLIP and DRM_IOCTL_MODE_ATOMIC (is this?).
> 
> The questions I have with this are:
> 
> 1. What is the framebuffer I can send to the backend?
> I assume I can use crtc->primary->fb from 
> drm_crtc_helper_funcs.atomic_flush,
> is this assumption correct?

Planes are explicit in the atomic world, so you should just deal with
the plane if and when it's part of the drm_atomic_state. Look at eg.
drm_atomic_helper_commit_planes() how to figure out what's in the
state. And I guess you're already planning on using that helper since
you mention .atomic_flush().

One should really think of the drm_atomic_state as more of a transaction
rather than as a state (since not all objects need be part of it).

> 
> 2. Is the check (either to send or not) for crtc->primary->fb != NULL 
> enough?
> I assume there are other cases when .atomic_flush gets called,
> so is there a way I can filter out unneeded cases? E.g. those which
> are not for page flipping?

Each object taking part in the transaction will have an associated
new state and old state. You can compare the two to figure out what
changed, and in addition there may already some flags in the base
class for the state that indicate what may have changed (eg.
drm_crtc_state::mode_changed etc.). Such flags may be set by the
core to help figure out what needs doing.

> 
> 3. If communication with the backend fails there is no way for me
> to tell the DRM core about this as .atomic_flush is called after
> "the point of no return". Do you think I can remember the error
> code and report it next time .atomic_check is called? So, other page
> flips will not run on broken connection, for example.

Do you know when it might fail? If so you should implement the appropriate
checks in .atomic_check() etc. to try and make sure it never happens
(barring a total hardware failure for example).

Generally what we (i915) do is try to check everything up front, but if
an unexpected error does happen later we just muddle through and log an
error.

For us I think the most common late failure is DP link training failure.
That we can't fully check up front since it depends on external factors:
sink, dongles, cabling, etc. For those we added a new connector property
to signal to userspace that the link is having issues, allowing
userspace to reconfigure things such that we lower the link bandwidth
requirements.

The link_status mechanism could perhaps be used to to work around other
"late errors". But in general if you want to somehow fix that error you
have to know what caused it, no? So if you just get a random error for
seemingly no reason there's very little you can do apart from blindly
retrying and logging an error. For the DP case the fallback mechanism is
pretty clear: reduce link rate and/or number of lanes.

As for signalling the error in the next ioctl call, that could be done
I suppose. But again the question is how would userspace know what
(if anything) it can do to fix the problem?

> 
> 4. As per my understanding I cannot put XENDISPL_OP_PG_FLIP request
> into .atomic_check as the check doesn't guarantee that .atomic_flush
> will follow. Is this correct or there is a more neat solution exists?

Thou shalt not touch the hardware until .atomic_commit(). Note that
.atomic_commit() itself is still allowed to fail, but only if you
can fail atomically.

The .atomic_flush() & co. helper stuff is designed in a way that
expects no failures at that late stage of the commit. If that
doesn't suit your hardware design then you may opt to not use the
helper.

Also note that non-blocking operations can make this even more difficult
because in that case even .atomic_commit() doesn't generally touch
the hardware and instead that task is delegated to eg. a work queue.
So by the time the hardware indicates an error the ioctl has long since
returned to userspace.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel