Re: Questions on page flips and atomic modeset

Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> · Fri, 9 Feb 2018 18:24:46 +0200

On Fri, Feb 09, 2018 at 06:09:09PM +0200, Oleksandr Andrushchenko wrote:
> On 02/09/2018 05:53 PM, Ville Syrjälä wrote:
> > On Thu, Feb 08, 2018 at 11:53:30AM +0200, Oleksandr Andrushchenko wrote:
> >> Hello, Ville!
> >>
> >> On 02/06/2018 06:04 PM, Ville Syrjälä wrote:
> >>> On Tue, Feb 06, 2018 at 11:59:37AM +0200, Oleksandr Andrushchenko wrote:
> >>>> Hello, Ville!
> >>>>
> >>>> Thank you very much for such a comprehensive answer.
> >>>>
> >>>> Please see inline
> >>>>
> >>>>
> >>>> On 02/05/2018 06:47 PM, Ville Syrjälä wrote:
> >>>>> On Mon, Feb 05, 2018 at 06:03:25PM +0200, Oleksandr Andrushchenko wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>>
> >>>>>> I have a DRM driver which implements display protocol for Xen [1]
> >>>>>> and this protocol has a dedicated XENDISPL_OP_PG_FLIP request which
> >>>>>> tells the other end that some display buffer needs to be displayed,
> >>>>>> e.g. it is issued by the frontend DRM driver to the corresponding
> >>>>>> backend when the former wants to display a buffer.
> >>>>>> Two notes:
> >>>>>> 1. Communication between two remote parties can obviously fail.
> >>>>>> 2. At the moment only primary plane is supported, so we can think of
> >>>>>> the display buffer as of CRTC's primary fb.
> >>>>>>
> >>>>>> There are two APIs available for user-space to initiate a page-flip
> >>>>>> (please correct me if I am wrong here): legacy via DRM_IOCTL_MODE_PAGE_FLIP
> >>>>>> and more recent via DRM_IOCTL_MODE_ATOMIC. My current implementation
> >>>>>> (which is 4.9 based) uses drm_crtc_funcs.page_flip callback to send
> >>>>>> XENDISPL_OP_PG_FLIP request to the backend, so if communication fails
> >>>>>> I can return an error code, so the DRM core knows that page flip
> >>>>>> cannot be done.
> >>>>>>
> >>>>>> But now I am about to enable page flipping via DRM_IOCTL_MODE_ATOMIC for
> >>>>>> which this happens without DRM_IOCTL_MODE_PAGE_FLIP, but via .atomic_check +
> >>>>>> .atomic_flush callbacks.
> >>>>>>
> >>>>>> The solution I have in mind is that I move the XENDISPL_OP_PG_FLIP request
> >>>>>> to the .atomic_flush callback which seems to be the right place to serve
> >>>>>> both DRM_IOCTL_MODE_PAGE_FLIP and DRM_IOCTL_MODE_ATOMIC (is this?).
> >>>>>>
> >>>>>> The questions I have with this are:
> >>>>>>
> >>>>>> 1. What is the framebuffer I can send to the backend?
> >>>>>> I assume I can use crtc->primary->fb from
> >>>>>> drm_crtc_helper_funcs.atomic_flush,
> >>>>>> is this assumption correct?
> >>>>> Planes are explicit in the atomic world, so you should just deal with
> >>>>> the plane if and when it's part of the drm_atomic_state. Look at eg.
> >>>>> drm_atomic_helper_commit_planes() how to figure out what's in the
> >>>>> state.
> >>>> Yes, this is clear. The question was about the frame buffer
> >>>> I get in drm_crtc_helper_funcs.atomic_flush which only has
> >>>> old_crtc_state, so I assumed I can use crtc->primary->fb there.
> >>>> Or you mean I have to dig into old_crtc_state->state to find
> >>>> out if there is a plane and if it has a frambuffer and if so
> >>>> use it instead of crtc->primary->fb?
> >>> You will get the plane explicitly as part of the drm_atomic_state.
> >>> Normally you shouldn't need to go find it via other means.
> >>>
> >>> Oh, and never use the plane->fb etc. pointers in an atomic driver.
> >>> Those are for legacy purposes only. Atomic drivers should only ever
> >>> deal with proper state objects.
> >> I am using fb from corresponding state now, thank you
> >>>>>     And I guess you're already planning on using that helper since
> >>>>> you mention .atomic_flush().
> >>>> Not directly, but via drm_mode_config_funcs.atomic_commit
> >>> .atomic_commit() is a hook the driver has to provide. Most drivers use
> >>> the helper for it, which in turn requires the driver to provide other
> >>> hooks such as .atomic_flush(). That is what you appear to be doing.
> >> you are correct
> >>>>> One should really think of the drm_atomic_state as more of a transaction
> >>>>> rather than as a state (since not all objects need be part of it).
> >>>> Thank you
> >>>>>> 2. Is the check (either to send or not) for crtc->primary->fb != NULL
> >>>>>> enough?
> >>>>>> I assume there are other cases when .atomic_flush gets called,
> >>>>>> so is there a way I can filter out unneeded cases? E.g. those which
> >>>>>> are not for page flipping?
> >>>>> Each object taking part in the transaction will have an associated
> >>>>> new state and old state.
> >>>> As I replied above I only have old state in .atomic_flush
> >>> Oh right. That way of passing the old state only dates back to earlier
> >>> days of atomic. To find the new state what you should do these days
> >>> is something like:
> >>>
> >>> foo(struct drm_crtc *crtc, struct drm_crtc_state *old_state)
> >>> {
> >>> 	struct drm_atomic_state *state = old_state->state;
> >>> 	struct drm_crtc_state *new_state =
> >>> 		drm_atomic_get_new_crtc_state(state, crtc);
> >>> 	...
> >>>
> >>> The old way was to use the crtc->state pointer as either the old
> >>> or new state depending on where you are in the commit sequence.
> >>> But that wasn't very good so Maarten changed things so that we
> >>> now have explicit old and new states for each object tracked in
> >>> the drm_atomic_state.
> >>>
> >>> I think what we should really do is change most of the hooks to
> >>> pass crtc+drm_atomic_state instead, and let each function grab
> >>> the old and/or new crtc state from the drm_atomic_state as needed.
> >>> I believe that would avoid confusing people with just the old or
> >>> new state.
> >> That would be great
> >>>>>     You can compare the two to figure out what
> >>>>> changed, and in addition there may already some flags in the base
> >>>>> class for the state that indicate what may have changed (eg.
> >>>>> drm_crtc_state::mode_changed etc.). Such flags may be set by the
> >>>>> core to help figure out what needs doing.
> >>>> Yes, thank you
> >>>>>> 3. If communication with the backend fails there is no way for me
> >>>>>> to tell the DRM core about this as .atomic_flush is called after
> >>>>>> "the point of no return". Do you think I can remember the error
> >>>>>> code and report it next time .atomic_check is called? So, other page
> >>>>>> flips will not run on broken connection, for example.
> >>>>> Do you know when it might fail?
> >>>> Not really, this is a software protocol to talk from
> >>>> the frontend para-virtual DRM driver to its backend
> >>>> in other Xen domain
> >>> I see. Well, without evidence to the contrary I'd probably just assume
> >>> it'll never fail.
> >> I am not sure I can guarantee this
> >>>    That avoids complicating the code with potentially
> >>> useless logic.
> >> Unfortunately, I'll have to put something in
> >>>    Hammering it with something like igt for a while might
> >>> serve as a good way to test that assumption. Not sure how many tests in
> >>> igt currently run on non-i915 though.
> >> The first test suite I tried was IGT indeed, but unfortunately
> >> it is coupled with i915 too much, so only very basic tests
> >> could be run. Then I found [1] which can be easily extended
> >> without much efforts, so I stick to fork of it (hope to contribute
> >> there later, for example, by adding ION tests)
> >>>>>     If so you should implement the appropriate
> >>>>> checks in .atomic_check() etc. to try and make sure it never happens
> >>>>> (barring a total hardware failure for example).
> >>>>>
> >>>>> Generally what we (i915) do is try to check everything up front, but if
> >>>>> an unexpected error does happen later we just muddle through and log an
> >>>>> error.
> >>>>>
> >>>>> For us I think the most common late failure is DP link training failure.
> >>>>> That we can't fully check up front since it depends on external factors:
> >>>>> sink, dongles, cabling, etc. For those we added a new connector property
> >>>>> to signal to userspace that the link is having issues, allowing
> >>>>> userspace to reconfigure things such that we lower the link bandwidth
> >>>>> requirements.
> >>>> I cannot do that this way, because the driver has to run
> >>>> seamlessly for user-space, no specific utilities  are
> >>>> expected to run.
> >>> There must be something running or you never get anything on the screen.
> >>>
> >> Yes, sorry for not being precise: I meant that nothing
> >> driver specific, e.g. which knows internals of the driver.
> >>>>> The link_status mechanism could perhaps be used to to work around other
> >>>>> "late errors". But in general if you want to somehow fix that error you
> >>>>> have to know what caused it, no?
> >>>> That is correct, so I will print an error message on
> >>>> page flip error so user has at list a clue on what's
> >>>> wrong
> >>>>>     So if you just get a random error for
> >>>>> seemingly no reason there's very little you can do apart from blindly
> >>>>> retrying and logging an error. For the DP case the fallback mechanism is
> >>>>> pretty clear: reduce link rate and/or number of lanes.
> >>>>>
> >>>>> As for signalling the error in the next ioctl call, that could be done
> >>>>> I suppose.
> >>>> I tried that approach and it seems to work.
> >>>>>     But again the question is how would userspace know what
> >>>>> (if anything) it can do to fix the problem?
> >>>> Well, this would be seen as just an error to user-space
> >>>> and unfortunately if it is not prepared to deal with then it will
> >>>> close. Not sure I can do anything smart about it
> >>>>>> 4. As per my understanding I cannot put XENDISPL_OP_PG_FLIP request
> >>>>>> into .atomic_check as the check doesn't guarantee that .atomic_flush
> >>>>>> will follow. Is this correct or there is a more neat solution exists?
> >>>>> Thou shalt not touch the hardware until .atomic_commit(). Note that
> >>>>> .atomic_commit() itself is still allowed to fail, but only if you
> >>>>> can fail atomically.
> >>>>>
> >>>>> The .atomic_flush() & co. helper stuff is designed in a way that
> >>>>> expects no failures at that late stage of the commit. If that
> >>>>> doesn't suit your hardware design then you may opt to not use the
> >>>>> helper.
> >>>>>
> >>>>> Also note that non-blocking operations can make this even more difficult
> >>>>> because in that case even .atomic_commit() doesn't generally touch
> >>>>> the hardware and instead that task is delegated to eg. a work queue.
> >>>>> So by the time the hardware indicates an error the ioctl has long since
> >>>>> returned to userspace.
> >>>>>
> >>>> This is my case as I send a page flip request to the backend
> >>>> and later in time receive the corresponding response.
> >>>>
> >>>> Do you mind looking at my current WIP implementation of
> >>>> atomic commit [1], [2]? This work is done as a preparation for
> >>>> upstreaming the driver and is still in progress though.
> >>> Usually one would handle page flips from the plane's .atomic_update()
> >>> hook instead of the crtc's .atomic_flush(). But if you only have the one
> >>> plane then this could work. But you should get at the plane fb via
> >>> drm_atomic_get_new_plane_state() etc.
> >>>
> >>> Generally looks like you need some more work on the plane .atomic_check()
> >>> function. Using drm_atomic_helper_check_plane_state() should get you
> >>> most of the way there I'd imagine. Not sure if disabling the plane
> >>> independently of the crtc makes any sense, but if not you should
> >>> look at drm_simple_kms_helper.c for an example. Hmm. Not sure if you
> >>> couldn't just use drm_simple_kms_helper.c yourself actually.
> >> Great! As I am moving away from legacy stuff this is a
> >> perfect fit now. So, I will base on drm_simple_kms_helper.
> >>
> >> BTW, it has a small issue which can be easily solved in the
> >> DRM core I believe: if your driver needs to work with vblanks,
> >> then you have to provide drm_driver.enable_vblank/disable_vblank
> >> because drm_simple_kms_helper.drm_crtc_funcs do not provide any.
> >> The problem here is that drm_driver.enable_vblank/disable_vblank
> >> are most probably dummy functions and what is more they are marked
> >> as deprecated.
> > Hmm. Oh, I think someone just wanted people to move to using the
> > corresponding hooks in the crtc funcs.
> My understanding is that that was the intention
> > I'm not sure how people do things
> > when they don't have vblank interrupts, but I think in general if you
> > don't initialize vblank support those hooks shouldn't get called.
> That is correct, but if the driver is drm_simple_kms_helper
> based and wants vblanks, then you'll have to do something
> like [1], [2].
> At the same time [3], [4] say enable_vblank/disable_vblank
> are deprecated in struct drm_driver.
> So, I think that drm_simple_kms_helper and its drm_crtc_funcs
> should provide corresponding callbacks in order to avoid using
> deprecated functionality.

I see. Well, I'm sure people are willing to entertain patches ;)

> [1] 
> http://elixir.bootlin.com/linux/v4.15.2/source/drivers/gpu/drm/pl111/pl111_drv.c#L181
> [2] 
> http://elixir.bootlin.com/linux/v4.15.2/source/drivers/gpu/drm/mxsfb/mxsfb_drv.c#L336
> [3] 
> http://elixir.bootlin.com/linux/v4.15.2/source/include/drm/drm_drv.h#L206
> [4] 
> http://elixir.bootlin.com/linux/v4.15.2/source/include/drm/drm_drv.h#L222

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel