Re: [PATCH 2/3] drm: Add basic helper to allow precise pageflip timestamps in vrr.

Daniel Vetter <daniel@xxxxxxxx> · Wed, 13 Feb 2019 16:14:19 +0100

On Wed, Feb 13, 2019 at 3:33 PM Kazlauskas, Nicholas
<Nicholas.Kazlauskas@xxxxxxx> wrote:
>
> On 2/13/19 4:50 AM, Daniel Vetter wrote:
> > On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
> >> On Mon, Feb 11, 2019 at 6:04 PM Daniel Vetter <daniel@xxxxxxxx> wrote:
> >>>
> >>> On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
> >>> <Nicholas.Kazlauskas@xxxxxxx> wrote:
> >>>>
> >>>> On 2/11/19 3:35 AM, Daniel Vetter wrote:
> >>>>> On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
> >>>>>> The pageflip completion timestamps transmitted to userspace
> >>>>>> via pageflip completion events are supposed to describe the
> >>>>>> time at which the first pixel of the new post-pageflip scanout
> >>>>>> buffer leaves the video output of the gpu. This time is
> >>>>>> identical to end of vblank, when active scanout starts.
> >>>>>>
> >>>>>> For a crtc in standard fixed refresh rate, the end of vblank
> >>>>>> is identical to the vblank timestamps calculated by
> >>>>>> drm_update_vblank_count() at each vblank interrupt, or each
> >>>>>> vblank dis-/enable. Therefore pageflip events just carry
> >>>>>> that vblank timestamp as their pageflip timestamp.
> >>>>>>
> >>>>>> For a crtc switched to variable refresh rate mode (vrr), the
> >>>>>> pageflip completion timestamps are identical to the vblank
> >>>>>> timestamps iff the pageflip was executed early in vblank,
> >>>>>> before the minimum vblank duration elapsed. In this case
> >>>>>> the time of display onset is identical to when the crtc
> >>>>>> is running in fixed refresh rate.
> >>>>>>
> >>>>>> However, if a pageflip completes later in the vblank, inside
> >>>>>> the "extended front porch" in vrr mode, then the vblank will
> >>>>>> terminate at a fixed (back porch) duration after flip, so
> >>>>>> the display onset time is delayed correspondingly. In this
> >>>>>> case the vblank timestamp computed at vblank irq time would
> >>>>>> be too early, and we need a way to calculate an estimated
> >>>>>> pageflip timestamp that will be later than the vblank timestamp.
> >>>>>>
> >>>>>> How a driver determines such a "late flip" timestamp is hw
> >>>>>> and driver specific, but this patch adds a new helper function
> >>>>>> that allows the driver to propose such an alternate "late flip"
> >>>>>> timestamp for use in pageflip events:
> >>>>>>
> >>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> >>>>>>
> >>>>>> When sending out pageflip events, we now compare that proposed
> >>>>>> flip_timestamp against the vblank timestamp of the current
> >>>>>> vblank of flip completion and choose to send out the greater/
> >>>>>> later timestamp as flip completion timestamp.
> >>>>>>
> >>>>>> The most simple way for a kms driver to supply a suitable
> >>>>>> flip_timestamp in vrr mode would be to simply take a timestamp
> >>>>>> at start of the pageflip completion handler, e.g., pageflip
> >>>>>> irq handler: flip_timestamp = ktime_get(); and then set that
> >>>>>> as proposed "late" alternative timestamp via ...
> >>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> >>>>>>
> >>>>>> More clever approaches could try to add some corrective offset
> >>>>>> for fixed back porch duration, or ideally use hardware features
> >>>>>> like hw timestamps to calculate the exact end time of vblank.
> >>>>>>
> >>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@xxxxxxxxx>
> >>>>>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@xxxxxxx>
> >>>>>> Cc: Harry Wentland <harry.wentland@xxxxxxx>
> >>>>>> Cc: Alex Deucher <alexander.deucher@xxxxxxx>
> >>>>>
> >>>>> Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
> >>>>> the right timestampe, once? With this I guess if you do a vblank query in
> >>>>> between the wrong and the right vblank you'll get the bogus value. Not
> >>>>> really great for userspace.
> >>>>> -Daniel
> >>>>
> >>>> I think we calculate the timestamp and send the vblank event both within
> >>>> the pageflip IRQ handler so calculating the right pageflip timestamp
> >>>> once could probably be done. I'm not sure if it's easier than proposing
> >>>> a later flip time with an API like this though.
> >>>>
> >>>> The actual scanout time should be known from the page-flip handler so
> >>>> the semantics for VRR on/off remain the same. This is because the
> >>>> page-flip triggers entering the back porch if we're in the extended
> >>>> front porch.
> >>>>
> >>>> But scanout time from vblank events for something like
> >>>> DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
> >>>> treated as estimates. If we're in the regular front porch then the
> >>>> timing to scanout is based on the fixed duration front porch for the
> >>>> current mode. If we're in the extended back porch then it's technically
> >>>> driver defined but the most reasonable guess is to assume that the front
> >>>> porch is going to end at any moment, so just return the length of the
> >>>> back porch for getting the scanout time.
> >>>>
> >>>> Proposing the late timestamp shouldn't affect vblank event in the
> >>>> DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
> >>>> event case. I'm not sure if that's what's guaranteed to happen with this
> >>>> patch though. There doesn't seem to be any locking on either
> >>>> dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
> >>>> the same vblank event back as the one just stored I don't think it's
> >>>> guaranteed.
> >>>
> >>> That's the inconsistency I mean to highlight - the timestamp for the
> >>> same frame as observed through flip complete and through the
> >>> wait_vblank ioctl can differ. Which they really shouldn't.
> >>>
> >>
> >> Ideally they shouldn't differ. The kernel docs for drm_crtc_state say
> >> that vblank and pageflip timestamps should always match. But then the
> >> kernel docs for "Variable refresh properties" in drm_connector.c for
> >> vblank timestamps were changed for the VRR implementation in Linux
> >> 5.0-rc to redefine them when in VRR mode. They are defined, but
> >> probably rather useless for any practical purpose, like this:
> >>
> >> "The semantics for the vertical blank timestamp differ when
> >> variable refresh rate is active. The vertical blank timestamp
> >> is defined to be an estimate using the current mode's fixed
> >> refresh rate timings. The semantics for the page-flip event
> >> timestamp remain the same."
> >
> > Uh I missed that. That sounds like nonsense tbh.
> >
> >> So our docs contradict each other as of Linux 5.0-rc. Certainly having
> >> useful vblank timetamps would be useful.
> >
> > Yup, imo vblank should still match page_flip. Otherwise I expect a lot of
> > hilarity will ensue.
>
> I would imagine you would see more breakage by changing userspace
> expectations for when the event is actually received.
>
> The IOCTL is "wait for vblank", but if we start sending the event near
> the end of vblank it becomes more "wait for pageflip" (near scanout) or
> "wait 15ms while the VRR front porch times out and causes flickering".
>
> I don't speak for all userspace but it seems more useful to me to have
> the event sent at the start of vblank.

Maybe we need to things. The problem is that the timestampe is defined
to be correct for start-of-frame. And right now the amdgpu
implementation of VRR doesn't even give an accurate timestampe for the
flip, so most likely no one yet thought about this properly.
Definitely no testcases (or it would have been caught), whereas we
have tons of testcases for fixed refresh.

> VRR in its current state is useful for applications that aren't
> completely timing sensitive (like most games). This is because the
> driver has free reign in restricting the min/max bounds of the range -
> allowing for enhancements like low framerate compensation / BTR.
>
> For a concrete example, imagine you're not doing frontbuffer rendering
> but you're still trying to target a specific scanout time. So even if we
> were to send the vblank event at vpos 0 you'd get an accurate scanout
> timestamp for the next scanout after the vback porch ends. You're not
> going to be doing any rendering or preparation in this time because it's
> simply too short. So you'd target the scanout period after this one,
> which you'd need to calculate somehow in userspace. Taking the deltas
> between two vblank events is completely useless here because you'll get
> back the vrr min refresh delta. So you can take your source content rate
> or some fixed rate you'd like to target within the range but even then
> you're still not guaranteed to have that flip end up at that scanout time.
>
> The current solution is to just return the scanout pos at vrr max, or
> the current mode's fixed refresh rate. If you're flip before the fixed
> duration front porch ends then that scanout time will likely be accurate
> since you can't flip faster than vrr max.
>
> Is there actually userspace that cares that these timestamps need to
> match? Like said below, it seems that wait for vblank is mostly about
> scheduling and for that I think having the event sent back at the start
> of vblank is more important here.

There's 3 cases:
- DRI3: Waits for vblank for frontbuffer blits. If you combine that
with VRR then imo you deserve to keep all the pieces.
- generic compositors. Use VRR for scheduling, expect that a page_flip
queued immediately after a page_flip will hit the next frame, and not
the current frame. They also expect that this will not give you the
slowest possible refresh rate, but I guess VRR plus frame scheduling
is currently out of scope. It's kinda just for as-fast-as-possible
games.
- flip timestamp. Currently the entire codebase assumes that vblank ts
= flip ts. I don't really see a point in breaking that.
- better scheduling of VRR: what compositors actually do isn't wait
for the vblank, that only latches the page_flip. They set up a
hrtimer, with a fixed/auto-tuned head-start, from which they start
their rendering. The same could be done for VRR, if we expose the
earliest/latest vblank end times (and how much they can shift). vblank
ioctl is only used for syncing that up and making sure we're hitting
the right frame. This is all for pageflipping compositors. This is
also what they'll probably keep doing when we give them a desired flip
time (instead of a target frame), they still need an hrtimer to fire a
little bit ahead to get the frame rendered.

So taking all together, we have 2 use-cases I think:
- frontbuffer rendering. Kinda wants vblank event to fire at the
beginning of vblank. Won't ever do VRR
- pageflipping compositors. Don't care about the vblank event, either
they have an hrtimer timer (which gets synced with the vblank ioctl),
or just flip as fast as possible. They don't care one bit when the
vblank event happens, as long as a pageflip scheduled immediately
after the event shows up only hits the next frame (which is the delay
code that currently stretches vblank unecessarily with VRR if you're
already a bit late).

Making sure that vblank and page flip timestamp match the real
userspace seems like the cleanest semantics, and the least surprising
to userspace. E.g. what happens if you implement VRR in a compositor,
but then tune your hrtimer within the vblank, when the vblank
timestamp is totally not reflecting your real pageflip? I just don't
see what's the upside of having a vblank timestamp that's not actually
reflecting the real flip.

Also, if your app wants the start of vblank, that's very easy to
compute: Just add the scanout time to the last vblank timestamp.
Figuring out the start of frame otoh isn't doable if you can't query,
and constantly pageflipping just to have an accurate start-of-frame
(for tuning your hrtimer) seems rather wasteful - it outright defeats
some of the use-cases for VRR.

So what benefit do you see in having vblank ts and pageflip ts not
agree? What's the use-case this solves - aside from "less typing" :-)
If the only use-case is VRR for frontbuffer rendering, then that
doesn't even work right now (aside from that I think it's something we
shouldn't support and definitely not encourage).
-Daniel

> >>> Now added complication is that amdgpu sends out vblank events really
> >>> early, which is used by userspace to do frontbuffer rendering in the
> >>> vblank time. But I don't think anyone wants to do both VRR and
> >>
> >> I think all kms drivers try to call drm_crtc_handle_vblank() at start
> >> of vblank to give Mesa the most time for frontbuffer rendering for
> >> classic X. But vblank events are also used for scheduling bufferswaps
> >> or other stuff for redirected windowed rendering, or via api's like
> >> OML_sync_controls glXWaitForMscOML, so there might be other things
> >> affected by a more delayed vblank handling.
> >
> > The frontbuffer rendering is very much X driver specific, and I think
> > -amdgpu/radeon is the only one that requires this. No i915 driver ever
> > used the vblank interrupt to schedule frontbuffer blits, we use some
> > CS-side stalls.
> >
> > Wrt scheduling pageflips: The rule is that right after you've received the
> > vblank for frame X, then an immediately schedule pageflip should hit X+1,
> > but not X. amdgpu had this broken, but it's fixed since a while.
> >
> >>> frontbuffer rendering, hence I think it should be possible to create
> >>> correct vblank timestamps for the VRR case, while leaving the current
> >>> logic in place for everything else. But that means moving the entire
> >>> vblank processing to where you know the next frame's start time for
> >>> VRR, both for page flips and for all other vblank events.
> >>> -Daniel
> >>>
> >>
> >> I think for amdgpu it would be doable by calling drm_handle_vblank
> >> from the pageflip irq handler in VRR mode, and then additionally from
> >> the vblank interrupt handler like now. Vblank irq's are triggered via
> >> vline interrupts with programmable vertical trigger line position, so
> >> we could program the trigger line to be start of back-porch instead of
> >> start of front-porch. In the pageflip case we do vblank handling +
> >> timestamping + pflip completion from pflip irq, and have the regular
> >> back-porch vblank handling for refresh cycles in which no
> >> pageflip/pflip irq happened.
> >
> > Hm, can't we simply program the vblank to occur in the back porch region
> > for VRR? The vblank timestamping should automatically correct the
> > timestamp in that case. Would require a lot less code changes.
> >
> > Plus update the documentation ofc.
> >
> >> Not sure if/how that would mess with below-vrr-refresh-rate-range
> >> support for stutter reduction at low fps though? Or with performance,
> >> given vblank event dispatch could be delayed a lot in VRR compared to
> >> normal mode? Or with all the vblank accounting during modesets or crtc
> >> en/disable?
>
> I think we have extra VLINEs that can be programmed at vpos 0 that could
> resolve this but I'm still not really sold on the idea.
>
> >
> > Currently VRR and explicit frame scheduling don't mix. We've discussed
> > this, and consens seems to be that we need a new property for page_flip,
> > similar to the target frame, but instead of frames a timestamp. Then the
> > driver can hit a specific time for the next frame.
> >
> > And since compositors use the vblank ioctl just to do frame scheduling, I
> > don't think we have to worry hugely about interactions there. Much better
> > to aim for clean semantics (i.e. vblank and flip complete timestamps
> > better match).
> >
> >> Also, could other drivers like Intel easily do this delayed vblank
> >> processing in the non-pageflip case?
> >
> > Intel's problem :-) But yeah we have page flip timestamp registers, so
> > worst case all we need to do is shuffle code around a bit. And we have
> > interrupts for both start of vblank and and of vblank. It might be a bit
> > more work because we use the vblank interrupt to avoid races in atomic
> > updates, but that's not too terrible to fix.
> >
> >>  From my user perspective as a developer of a neuroscience research
> >> application that has a couple of exciting/important use cases for VRR
> >> mode, i absolutely do need trustworthy and precise pageflip event
> >> timestamps that represent reality, otherwise VRR mode would be less
> >> than useless for me. I can do without meaningful vblank timestamps in
> >> VRR mode though, and expected to do without them, as any scheduling
> >> has to happen time-based instead of based on vblank counts anyway,
> >> given that vblank counts are quite meaningless if every frame can have
> >> a different duration. I'd expect that situation to be similar for
> >> other apps that would want to use timestamps in VRR mode like video
> >> games, movie players or VR/AR applications.
> >
> > Yeah for that you want to schedule your frames with a target frame number.
> > Would still be nice if vblank timestamps wouldn't be an outright lie (and
> > the timestamp is still useful information, even if the frame counter
> > isn't - many displays have min/max frame rates for VRR, which is
> > information we could perhaps expose).
> > -Daniel >
> >>
> >> -mario
> >>
> >>>>
> >>>> Nicholas Kazlauskas
> >>>>
> >>>>>
> >>>>>> ---
> >>>>>>    drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
> >>>>>>    include/drm/drm_vblank.h     |  8 ++++++
> >>>>>>    2 files changed, 56 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> >>>>>> index 98e091175921..4b3a4c38fabe 100644
> >>>>>> --- a/drivers/gpu/drm/drm_vblank.c
> >>>>>> +++ b/drivers/gpu/drm/drm_vblank.c
> >>>>>> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
> >>>>>>               u64 seq, ktime_t now)
> >>>>>>    {
> >>>>>>       struct timespec64 tv;
> >>>>>> +    ktime_t alt_flip_time;
> >>>>>>
> >>>>>>       switch (e->event.base.type) {
> >>>>>> -    case DRM_EVENT_VBLANK:
> >>>>>>       case DRM_EVENT_FLIP_COMPLETE:
> >>>>>> +            /*
> >>>>>> +             * For flip completion events, override "now" time
> >>>>>> +             * with alt_flip_time provided by the driver via
> >>>>>> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
> >>>>>> +             * if that time is later than given "now" vblank time.
> >>>>>> +             */
> >>>>>> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
> >>>>>> +            if (alt_flip_time > now)
> >>>>>> +                    now = alt_flip_time;
> >>>>>> +            /* Fallthrough */
> >>>>>> +    case DRM_EVENT_VBLANK:
> >>>>>>               tv = ktime_to_timespec64(now);
> >>>>>>               e->event.vbl.sequence = seq;
> >>>>>>               /*
> >>>>>> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> >>>>>>
> >>>>>>               now = ktime_get();
> >>>>>>       }
> >>>>>> +
> >>>>>>       e->pipe = pipe;
> >>>>>>       send_vblank_event(dev, e, seq, now);
> >>>>>>    }
> >>>>>>    EXPORT_SYMBOL(drm_crtc_send_vblank_event);
> >>>>>>
> >>>>>> +/**
> >>>>>> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
> >>>>>> + * @crtc: the source CRTC of the pageflip completion event
> >>>>>> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
> >>>>>> + *
> >>>>>> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
> >>>>>> + * by the pageflip event can never be earlier than the vblank timestamp of the
> >>>>>> + * vblank of flip completion, as that vblank timestamp defines the end of the
> >>>>>> + * shortest possible vblank duration. In case of a delayed flip completion
> >>>>>> + * inside the extended VRR front porch however, the end of vblank can be much
> >>>>>> + * later, so the driver must assign an estimated timestamp of that later end of
> >>>>>> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
> >>>>>> + * set an alternate flip completion timestamp in case of late flip completions
> >>>>>> + * in extended vblank. In the most simple case, this @flip_time timestamp could
> >>>>>> + * simply be a ktime_get() timestamp taken at the start of the pageflip
> >>>>>> + * completion routine, with some constant duration of the back porch interval
> >>>>>> + * added, although more precise estimates may be possible on some hardware if
> >>>>>> + * the hardware provides some means of timestamping the true end of vblank.
> >>>>>> + *
> >>>>>> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
> >>>>>> + * will use either the standard vblank timestamp, calculated for a minimum
> >>>>>> + * duration vblank, or the provided @flip_time if that time is later than the
> >>>>>> + * vblank timestamp, to get the best possible estimate of start of display of
> >>>>>> + * the new post-pageflip scanout buffer.
> >>>>>> + */
> >>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> >>>>>> +                                     ktime_t flip_time)
> >>>>>> +{
> >>>>>> +    struct drm_device *dev = crtc->dev;
> >>>>>> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
> >>>>>> +
> >>>>>> +    vblank->alt_flip_time = flip_time;
> >>>>>> +}
> >>>>>> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
> >>>>>> +
> >>>>>>    static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
> >>>>>>    {
> >>>>>>       if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> >>>>>> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> >>>>>> index 6ad9630d4f48..aacf44694ab6 100644
> >>>>>> --- a/include/drm/drm_vblank.h
> >>>>>> +++ b/include/drm/drm_vblank.h
> >>>>>> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
> >>>>>>        * @time: Vblank timestamp corresponding to @count.
> >>>>>>        */
> >>>>>>       ktime_t time;
> >>>>>> +    /**
> >>>>>> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
> >>>>>> +     * a late pageflip completion in variable refresh rate mode. Pageflip
> >>>>>> +     * events will carry the later one of @time and @alt_flip_time.
> >>>>>> +     */
> >>>>>> +    ktime_t alt_flip_time;
> >>>>>>
> >>>>>>       /**
> >>>>>>        * @refcount: Number of users/waiters of the vblank interrupt. Only when
> >>>>>> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
> >>>>>>    u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
> >>>>>>    u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
> >>>>>>                                  ktime_t *vblanktime);
> >>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> >>>>>> +                                     ktime_t flip_time);
> >>>>>>    void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> >>>>>>                              struct drm_pending_vblank_event *e);
> >>>>>>    void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
> >>>>>> --
> >>>>>> 2.17.1
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> dri-devel mailing list
> >>>>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx
> >>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Daniel Vetter
> >>> Software Engineer, Intel Corporation
> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >
>
> Nicholas Kazlauskas
>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx