Re: Questions on page flips and atomic modeset

Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> · Fri, 9 Feb 2018 17:53:50 +0200

On Thu, Feb 08, 2018 at 11:53:30AM +0200, Oleksandr Andrushchenko wrote:
> Hello, Ville!
> 
> On 02/06/2018 06:04 PM, Ville Syrjälä wrote:
> > On Tue, Feb 06, 2018 at 11:59:37AM +0200, Oleksandr Andrushchenko wrote:
> >> Hello, Ville!
> >>
> >> Thank you very much for such a comprehensive answer.
> >>
> >> Please see inline
> >>
> >>
> >> On 02/05/2018 06:47 PM, Ville Syrjälä wrote:
> >>> On Mon, Feb 05, 2018 at 06:03:25PM +0200, Oleksandr Andrushchenko wrote:
> >>>> Hello,
> >>>>
> >>>>
> >>>> I have a DRM driver which implements display protocol for Xen [1]
> >>>> and this protocol has a dedicated XENDISPL_OP_PG_FLIP request which
> >>>> tells the other end that some display buffer needs to be displayed,
> >>>> e.g. it is issued by the frontend DRM driver to the corresponding
> >>>> backend when the former wants to display a buffer.
> >>>> Two notes:
> >>>> 1. Communication between two remote parties can obviously fail.
> >>>> 2. At the moment only primary plane is supported, so we can think of
> >>>> the display buffer as of CRTC's primary fb.
> >>>>
> >>>> There are two APIs available for user-space to initiate a page-flip
> >>>> (please correct me if I am wrong here): legacy via DRM_IOCTL_MODE_PAGE_FLIP
> >>>> and more recent via DRM_IOCTL_MODE_ATOMIC. My current implementation
> >>>> (which is 4.9 based) uses drm_crtc_funcs.page_flip callback to send
> >>>> XENDISPL_OP_PG_FLIP request to the backend, so if communication fails
> >>>> I can return an error code, so the DRM core knows that page flip
> >>>> cannot be done.
> >>>>
> >>>> But now I am about to enable page flipping via DRM_IOCTL_MODE_ATOMIC for
> >>>> which this happens without DRM_IOCTL_MODE_PAGE_FLIP, but via .atomic_check +
> >>>> .atomic_flush callbacks.
> >>>>
> >>>> The solution I have in mind is that I move the XENDISPL_OP_PG_FLIP request
> >>>> to the .atomic_flush callback which seems to be the right place to serve
> >>>> both DRM_IOCTL_MODE_PAGE_FLIP and DRM_IOCTL_MODE_ATOMIC (is this?).
> >>>>
> >>>> The questions I have with this are:
> >>>>
> >>>> 1. What is the framebuffer I can send to the backend?
> >>>> I assume I can use crtc->primary->fb from
> >>>> drm_crtc_helper_funcs.atomic_flush,
> >>>> is this assumption correct?
> >>> Planes are explicit in the atomic world, so you should just deal with
> >>> the plane if and when it's part of the drm_atomic_state. Look at eg.
> >>> drm_atomic_helper_commit_planes() how to figure out what's in the
> >>> state.
> >> Yes, this is clear. The question was about the frame buffer
> >> I get in drm_crtc_helper_funcs.atomic_flush which only has
> >> old_crtc_state, so I assumed I can use crtc->primary->fb there.
> >> Or you mean I have to dig into old_crtc_state->state to find
> >> out if there is a plane and if it has a frambuffer and if so
> >> use it instead of crtc->primary->fb?
> > You will get the plane explicitly as part of the drm_atomic_state.
> > Normally you shouldn't need to go find it via other means.
> >
> > Oh, and never use the plane->fb etc. pointers in an atomic driver.
> > Those are for legacy purposes only. Atomic drivers should only ever
> > deal with proper state objects.
> I am using fb from corresponding state now, thank you
> >
> >>>    And I guess you're already planning on using that helper since
> >>> you mention .atomic_flush().
> >> Not directly, but via drm_mode_config_funcs.atomic_commit
> > .atomic_commit() is a hook the driver has to provide. Most drivers use
> > the helper for it, which in turn requires the driver to provide other
> > hooks such as .atomic_flush(). That is what you appear to be doing.
> you are correct
> >>> One should really think of the drm_atomic_state as more of a transaction
> >>> rather than as a state (since not all objects need be part of it).
> >> Thank you
> >>>> 2. Is the check (either to send or not) for crtc->primary->fb != NULL
> >>>> enough?
> >>>> I assume there are other cases when .atomic_flush gets called,
> >>>> so is there a way I can filter out unneeded cases? E.g. those which
> >>>> are not for page flipping?
> >>> Each object taking part in the transaction will have an associated
> >>> new state and old state.
> >> As I replied above I only have old state in .atomic_flush
> > Oh right. That way of passing the old state only dates back to earlier
> > days of atomic. To find the new state what you should do these days
> > is something like:
> >
> > foo(struct drm_crtc *crtc, struct drm_crtc_state *old_state)
> > {
> > 	struct drm_atomic_state *state = old_state->state;
> > 	struct drm_crtc_state *new_state =
> > 		drm_atomic_get_new_crtc_state(state, crtc);
> > 	...
> >
> > The old way was to use the crtc->state pointer as either the old
> > or new state depending on where you are in the commit sequence.
> > But that wasn't very good so Maarten changed things so that we
> > now have explicit old and new states for each object tracked in
> > the drm_atomic_state.
> >
> > I think what we should really do is change most of the hooks to
> > pass crtc+drm_atomic_state instead, and let each function grab
> > the old and/or new crtc state from the drm_atomic_state as needed.
> > I believe that would avoid confusing people with just the old or
> > new state.
> That would be great
> >>>    You can compare the two to figure out what
> >>> changed, and in addition there may already some flags in the base
> >>> class for the state that indicate what may have changed (eg.
> >>> drm_crtc_state::mode_changed etc.). Such flags may be set by the
> >>> core to help figure out what needs doing.
> >> Yes, thank you
> >>>> 3. If communication with the backend fails there is no way for me
> >>>> to tell the DRM core about this as .atomic_flush is called after
> >>>> "the point of no return". Do you think I can remember the error
> >>>> code and report it next time .atomic_check is called? So, other page
> >>>> flips will not run on broken connection, for example.
> >>> Do you know when it might fail?
> >> Not really, this is a software protocol to talk from
> >> the frontend para-virtual DRM driver to its backend
> >> in other Xen domain
> > I see. Well, without evidence to the contrary I'd probably just assume
> > it'll never fail.
> I am not sure I can guarantee this
> >   That avoids complicating the code with potentially
> > useless logic.
> Unfortunately, I'll have to put something in
> >   Hammering it with something like igt for a while might
> > serve as a good way to test that assumption. Not sure how many tests in
> > igt currently run on non-i915 though.
> The first test suite I tried was IGT indeed, but unfortunately
> it is coupled with i915 too much, so only very basic tests
> could be run. Then I found [1] which can be easily extended
> without much efforts, so I stick to fork of it (hope to contribute
> there later, for example, by adding ION tests)
> >
> >>>    If so you should implement the appropriate
> >>> checks in .atomic_check() etc. to try and make sure it never happens
> >>> (barring a total hardware failure for example).
> >>>
> >>> Generally what we (i915) do is try to check everything up front, but if
> >>> an unexpected error does happen later we just muddle through and log an
> >>> error.
> >>>
> >>> For us I think the most common late failure is DP link training failure.
> >>> That we can't fully check up front since it depends on external factors:
> >>> sink, dongles, cabling, etc. For those we added a new connector property
> >>> to signal to userspace that the link is having issues, allowing
> >>> userspace to reconfigure things such that we lower the link bandwidth
> >>> requirements.
> >> I cannot do that this way, because the driver has to run
> >> seamlessly for user-space, no specific utilities  are
> >> expected to run.
> > There must be something running or you never get anything on the screen.
> >
> Yes, sorry for not being precise: I meant that nothing
> driver specific, e.g. which knows internals of the driver.
> >>> The link_status mechanism could perhaps be used to to work around other
> >>> "late errors". But in general if you want to somehow fix that error you
> >>> have to know what caused it, no?
> >> That is correct, so I will print an error message on
> >> page flip error so user has at list a clue on what's
> >> wrong
> >>>    So if you just get a random error for
> >>> seemingly no reason there's very little you can do apart from blindly
> >>> retrying and logging an error. For the DP case the fallback mechanism is
> >>> pretty clear: reduce link rate and/or number of lanes.
> >>>
> >>> As for signalling the error in the next ioctl call, that could be done
> >>> I suppose.
> >> I tried that approach and it seems to work.
> >>>    But again the question is how would userspace know what
> >>> (if anything) it can do to fix the problem?
> >> Well, this would be seen as just an error to user-space
> >> and unfortunately if it is not prepared to deal with then it will
> >> close. Not sure I can do anything smart about it
> >>>> 4. As per my understanding I cannot put XENDISPL_OP_PG_FLIP request
> >>>> into .atomic_check as the check doesn't guarantee that .atomic_flush
> >>>> will follow. Is this correct or there is a more neat solution exists?
> >>> Thou shalt not touch the hardware until .atomic_commit(). Note that
> >>> .atomic_commit() itself is still allowed to fail, but only if you
> >>> can fail atomically.
> >>>
> >>> The .atomic_flush() & co. helper stuff is designed in a way that
> >>> expects no failures at that late stage of the commit. If that
> >>> doesn't suit your hardware design then you may opt to not use the
> >>> helper.
> >>>
> >>> Also note that non-blocking operations can make this even more difficult
> >>> because in that case even .atomic_commit() doesn't generally touch
> >>> the hardware and instead that task is delegated to eg. a work queue.
> >>> So by the time the hardware indicates an error the ioctl has long since
> >>> returned to userspace.
> >>>
> >> This is my case as I send a page flip request to the backend
> >> and later in time receive the corresponding response.
> >>
> >> Do you mind looking at my current WIP implementation of
> >> atomic commit [1], [2]? This work is done as a preparation for
> >> upstreaming the driver and is still in progress though.
> > Usually one would handle page flips from the plane's .atomic_update()
> > hook instead of the crtc's .atomic_flush(). But if you only have the one
> > plane then this could work. But you should get at the plane fb via
> > drm_atomic_get_new_plane_state() etc.
> >
> > Generally looks like you need some more work on the plane .atomic_check()
> > function. Using drm_atomic_helper_check_plane_state() should get you
> > most of the way there I'd imagine. Not sure if disabling the plane
> > independently of the crtc makes any sense, but if not you should
> > look at drm_simple_kms_helper.c for an example. Hmm. Not sure if you
> > couldn't just use drm_simple_kms_helper.c yourself actually.
> Great! As I am moving away from legacy stuff this is a
> perfect fit now. So, I will base on drm_simple_kms_helper.
> 
> BTW, it has a small issue which can be easily solved in the
> DRM core I believe: if your driver needs to work with vblanks,
> then you have to provide drm_driver.enable_vblank/disable_vblank
> because drm_simple_kms_helper.drm_crtc_funcs do not provide any.
> The problem here is that drm_driver.enable_vblank/disable_vblank
> are most probably dummy functions and what is more they are marked
> as deprecated.

Hmm. Oh, I think someone just wanted people to move to using the
corresponding hooks in the crtc funcs. I'm not sure how people do things
when they don't have vblank interrupts, but I think in general if you
don't initialize vblank support those hooks shouldn't get called.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel