Re: [PATCH] drm/vblank: Avoid storing a timestamp for the same frame twice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 08, 2021 at 06:43:53PM +0100, Daniel Vetter wrote:
> On Mon, Feb 8, 2021 at 5:58 PM Ville Syrjälä
> <ville.syrjala@xxxxxxxxxxxxxxx> wrote:
> >
> > On Mon, Feb 08, 2021 at 10:56:36AM +0100, Daniel Vetter wrote:
> > > On Fri, Feb 05, 2021 at 11:19:19PM +0200, Ville Syrjälä wrote:
> > > > On Fri, Feb 05, 2021 at 06:24:08PM +0200, Ville Syrjälä wrote:
> > > > > On Fri, Feb 05, 2021 at 04:46:27PM +0100, Daniel Vetter wrote:
> > > > > > On Thu, Feb 04, 2021 at 05:55:28PM +0200, Ville Syrjälä wrote:
> > > > > > > On Thu, Feb 04, 2021 at 04:32:16PM +0100, Daniel Vetter wrote:
> > > > > > > > On Thu, Feb 04, 2021 at 04:04:00AM +0200, Ville Syrjala wrote:
> > > > > > > > > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > > > > > > > >
> > > > > > > > > drm_vblank_restore() exists because certain power saving states
> > > > > > > > > can clobber the hardware frame counter. The way it does this is
> > > > > > > > > by guesstimating how many frames were missed purely based on
> > > > > > > > > the difference between the last stored timestamp vs. a newly
> > > > > > > > > sampled timestamp.
> > > > > > > > >
> > > > > > > > > If we should call this function before a full frame has
> > > > > > > > > elapsed since we sampled the last timestamp we would end up
> > > > > > > > > with a possibly slightly different timestamp value for the
> > > > > > > > > same frame. Currently we will happily overwrite the already
> > > > > > > > > stored timestamp for the frame with the new value. This
> > > > > > > > > could cause userspace to observe two different timestamps
> > > > > > > > > for the same frame (and the timestamp could even go
> > > > > > > > > backwards depending on how much error we introduce when
> > > > > > > > > correcting the timestamp based on the scanout position).
> > > > > > > > >
> > > > > > > > > To avoid that let's not update the stored timestamp unless we're
> > > > > > > > > also incrementing the sequence counter. We do still want to update
> > > > > > > > > vblank->last with the freshly sampled hw frame counter value so
> > > > > > > > > that subsequent vblank irqs/queries can actually use the hw frame
> > > > > > > > > counter to determine how many frames have elapsed.
> > > > > > > >
> > > > > > > > Hm I'm not getting the reason for why we store the updated hw vblank
> > > > > > > > counter?
> > > > > > >
> > > > > > > Because next time a vblank irq happens the code will do:
> > > > > > > diff = current_hw_counter - vblank->last
> > > > > > >
> > > > > > > which won't work very well if vblank->last is garbage.
> > > > > > >
> > > > > > > Updating vblank->last is pretty much why drm_vblank_restore()
> > > > > > > exists at all.
> > > > > >
> > > > > > Oh sure, _restore has to update this, together with the timestamp.
> > > > > >
> > > > > > But your code adds such an update where we update the hw vblank counter,
> > > > > > but not the timestamp, and that feels buggy. Either we're still in the
> > > > > > same frame, and then we should story nothing. Or we advanced, and then we
> > > > > > probably want a new timestampt for that frame too.
> > > > >
> > > > > Even if we're still in the same frame the hw frame counter may already
> > > > > have been reset due to the power well having been turned off. That is
> > > > > what I'm trying to fix here.
> > > > >
> > > > > Now I suppose that's fairly unlikely, at least with PSR which probably
> > > > > does impose some extra delays before the power gets yanked. But at least
> > > > > theoretically possible.
> > > >
> > > > Pondering about this a bit further. I think the fact that the current
> > > > code takes the round-to-closest approach I used for the vblank handler
> > > > is perhaps a bit bad. It could push the seq counter forward if we're
> > > > past the halfway point of a frame. I think that rounding behaviour
> > > > makes sense for the irq since those tick steadily and so allowing a bit
> > > > of error either way seems correct to me. Perhaps round-down might be
> > > > the better option for _restore(). Not quites sure, need more thinking
> > > > probably.
> > >
> > > Yes this is the rounding I'm worried about.
> >
> > Actually I don't think this is really an issue since we are working
> > with the corrected timestamps here. Those always line up with
> > frames, so unless the correction is really buggy or the hw somehow
> > skips a partial frame it should work rather well. At least when
> > operating with small timescales. For large gaps the error might
> > creep up, but I don't think a small error in the predicted seq
> > number over a long timespan is really a problem.
> 
> That corrected timestamp is what can go wrong I think: There's no
> guarantee that drm_crtc_vblank_helper_get_vblank_timestamp_internal()
> flips to top-of-frame at the exact same time than when the hw vblank
> counter flips. Or at least I'm not seeing where we correct them both
> together.

We do this seqlock type of thing:
	do {
                cur_vblank = __get_vblank_counter(dev, pipe);
                rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, in_vblank_irq);
        } while (cur_vblank != __get_vblank_counter(dev, pipe) && --count > 0);

which guarantees the timestamp really is for the frame we think it is for.

> 
> > > But your point above that the hw might reset the counter again is also
> > > valid. I'm assuming what you're worried about is that we first do a
> > > _restore (and the hw vblank counter hasn't been trashed yet), and then in
> > > the same frame we do another restore, but now the hw frame counter has
> > > been trashe, and we need to update it?
> >
> > Yeah, although the pre-trashing _restore could also just be
> > a vblank irq I think.
> >
> > >
> > > > Another idea that came to me now is that maybe we should actually just
> > > > check if the current hw frame counter value looks sane, as in something
> > > > like:
> > > >
> > > > diff_hw_counter = current_hw_counter-stored_hw_counter
> > > > diff_ts = (current_ts-stored_ts)/framedur
> > > >
> > > > if (diff_hw_counter ~= diff_ts)
> > > >     diff = diff_hw_counter;
> > > > else
> > > >     diff = diff_ts;
> > > >
> > > > and if they seem to match then just keep trusting the hw counter.
> > > > So only if there's a significant difference would we disregard
> > > > the diff of the hw counter and instead use the diff based on the
> > > > timestamps. Not sure what "significant" is though; One frame, two
> > > > frames?
> > >
> > > Hm, another idea: The only point where we can trust the entire hw counter
> > > + timestamp sampling is when the irq happens. Because then we know the
> > > driver will have properly corrected for any hw oddities (like hw counter
> > > flipping not at top-of-frame, like the core expects).
> >
> > i915 at least gives out correct data regardless of when you sample
> > it. Well, except for the cases where the hw counter gets trashed,
> > in which case the hw counter is garbage (when compared with .last)
> > but the timestamp is still correct.
> 
> Hm where/how do we handle this? Maybe I'm just out of date with how it
> all works nowadays.

There's not much to handle. We know when exactly the counters increment and
thus can give out the correct answer to the question "which frame is this?".

> 
> > > So what if _restore always goes back to the last such trusted hw counter
> > > for computing the frame counter diff and all that stuff? That way if we
> > > have a bunch of _restore with incosisten hw vblank counter, we will a)
> > > only take the last one (fixes the bug you're trying to fix) b) still use
> > > the same last trusted baseline for computations (addresses the race I'm
> > > seeing).
> > >
> > > Or does this not work?
> >
> > I don't think I really understand what you're suggesting here.
> > _restore is already using the last trusted data (the stored
> > timestamp + .last).
> >
> > So the one thing _restore will have to update is .last.
> > I think it can either do what it does now and set .last
> > to the current hw counter value + update the timestamp
> > to match, or it could perhaps adjust the stored .last
> > such that the already stored timestamp and the updated
> > .last match up. But I think both of those options have
> > the same level or inaccuracy since both would still do
> > the same ts_diff->hw_counter_diff prediction.
> >
> > >
> > > It does complicate the code a bit, because we'd need to store the
> > > count/timestamp information from _restore outside of the usual vblank ts
> > > array. But I think that addresses everything.
> >
> > Hmm. So restore would store this extra information
> > somewhere else, and not update the normal stuff at all?
> > What exactly would we do with that extra data?
> 
> Hm I guess I didn't think this through. But the idea I had was:
> - _restore always recomputes back from the las
> drm_crtc_handl_vblank-stored timestamp.
> - the first drm_crtc_handle_vblank bakes in any corrections that
> _restore has prepared meanwhile
> - same applies to all the sampling functions we might look at lastes
> timestamps/counter values.

So I guess instead of _restore adjusting .last we would instead
mainatian a separate correction information and apply it when
doing the diff between the current hw counter vs. .last. Not sure
why that would be particularly better than just adjusting .last
directly.

-- 
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux