On Mon, Feb 8, 2021 at 5:58 PM Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> wrote: > > On Mon, Feb 08, 2021 at 10:56:36AM +0100, Daniel Vetter wrote: > > On Fri, Feb 05, 2021 at 11:19:19PM +0200, Ville Syrjälä wrote: > > > On Fri, Feb 05, 2021 at 06:24:08PM +0200, Ville Syrjälä wrote: > > > > On Fri, Feb 05, 2021 at 04:46:27PM +0100, Daniel Vetter wrote: > > > > > On Thu, Feb 04, 2021 at 05:55:28PM +0200, Ville Syrjälä wrote: > > > > > > On Thu, Feb 04, 2021 at 04:32:16PM +0100, Daniel Vetter wrote: > > > > > > > On Thu, Feb 04, 2021 at 04:04:00AM +0200, Ville Syrjala wrote: > > > > > > > > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > > > > > > > > > > > > > > > drm_vblank_restore() exists because certain power saving states > > > > > > > > can clobber the hardware frame counter. The way it does this is > > > > > > > > by guesstimating how many frames were missed purely based on > > > > > > > > the difference between the last stored timestamp vs. a newly > > > > > > > > sampled timestamp. > > > > > > > > > > > > > > > > If we should call this function before a full frame has > > > > > > > > elapsed since we sampled the last timestamp we would end up > > > > > > > > with a possibly slightly different timestamp value for the > > > > > > > > same frame. Currently we will happily overwrite the already > > > > > > > > stored timestamp for the frame with the new value. This > > > > > > > > could cause userspace to observe two different timestamps > > > > > > > > for the same frame (and the timestamp could even go > > > > > > > > backwards depending on how much error we introduce when > > > > > > > > correcting the timestamp based on the scanout position). > > > > > > > > > > > > > > > > To avoid that let's not update the stored timestamp unless we're > > > > > > > > also incrementing the sequence counter. We do still want to update > > > > > > > > vblank->last with the freshly sampled hw frame counter value so > > > > > > > > that subsequent vblank irqs/queries can actually use the hw frame > > > > > > > > counter to determine how many frames have elapsed. > > > > > > > > > > > > > > Hm I'm not getting the reason for why we store the updated hw vblank > > > > > > > counter? > > > > > > > > > > > > Because next time a vblank irq happens the code will do: > > > > > > diff = current_hw_counter - vblank->last > > > > > > > > > > > > which won't work very well if vblank->last is garbage. > > > > > > > > > > > > Updating vblank->last is pretty much why drm_vblank_restore() > > > > > > exists at all. > > > > > > > > > > Oh sure, _restore has to update this, together with the timestamp. > > > > > > > > > > But your code adds such an update where we update the hw vblank counter, > > > > > but not the timestamp, and that feels buggy. Either we're still in the > > > > > same frame, and then we should story nothing. Or we advanced, and then we > > > > > probably want a new timestampt for that frame too. > > > > > > > > Even if we're still in the same frame the hw frame counter may already > > > > have been reset due to the power well having been turned off. That is > > > > what I'm trying to fix here. > > > > > > > > Now I suppose that's fairly unlikely, at least with PSR which probably > > > > does impose some extra delays before the power gets yanked. But at least > > > > theoretically possible. > > > > > > Pondering about this a bit further. I think the fact that the current > > > code takes the round-to-closest approach I used for the vblank handler > > > is perhaps a bit bad. It could push the seq counter forward if we're > > > past the halfway point of a frame. I think that rounding behaviour > > > makes sense for the irq since those tick steadily and so allowing a bit > > > of error either way seems correct to me. Perhaps round-down might be > > > the better option for _restore(). Not quites sure, need more thinking > > > probably. > > > > Yes this is the rounding I'm worried about. > > Actually I don't think this is really an issue since we are working > with the corrected timestamps here. Those always line up with > frames, so unless the correction is really buggy or the hw somehow > skips a partial frame it should work rather well. At least when > operating with small timescales. For large gaps the error might > creep up, but I don't think a small error in the predicted seq > number over a long timespan is really a problem. That corrected timestamp is what can go wrong I think: There's no guarantee that drm_crtc_vblank_helper_get_vblank_timestamp_internal() flips to top-of-frame at the exact same time than when the hw vblank counter flips. Or at least I'm not seeing where we correct them both together. > > But your point above that the hw might reset the counter again is also > > valid. I'm assuming what you're worried about is that we first do a > > _restore (and the hw vblank counter hasn't been trashed yet), and then in > > the same frame we do another restore, but now the hw frame counter has > > been trashe, and we need to update it? > > Yeah, although the pre-trashing _restore could also just be > a vblank irq I think. > > > > > > Another idea that came to me now is that maybe we should actually just > > > check if the current hw frame counter value looks sane, as in something > > > like: > > > > > > diff_hw_counter = current_hw_counter-stored_hw_counter > > > diff_ts = (current_ts-stored_ts)/framedur > > > > > > if (diff_hw_counter ~= diff_ts) > > > diff = diff_hw_counter; > > > else > > > diff = diff_ts; > > > > > > and if they seem to match then just keep trusting the hw counter. > > > So only if there's a significant difference would we disregard > > > the diff of the hw counter and instead use the diff based on the > > > timestamps. Not sure what "significant" is though; One frame, two > > > frames? > > > > Hm, another idea: The only point where we can trust the entire hw counter > > + timestamp sampling is when the irq happens. Because then we know the > > driver will have properly corrected for any hw oddities (like hw counter > > flipping not at top-of-frame, like the core expects). > > i915 at least gives out correct data regardless of when you sample > it. Well, except for the cases where the hw counter gets trashed, > in which case the hw counter is garbage (when compared with .last) > but the timestamp is still correct. Hm where/how do we handle this? Maybe I'm just out of date with how it all works nowadays. > > So what if _restore always goes back to the last such trusted hw counter > > for computing the frame counter diff and all that stuff? That way if we > > have a bunch of _restore with incosisten hw vblank counter, we will a) > > only take the last one (fixes the bug you're trying to fix) b) still use > > the same last trusted baseline for computations (addresses the race I'm > > seeing). > > > > Or does this not work? > > I don't think I really understand what you're suggesting here. > _restore is already using the last trusted data (the stored > timestamp + .last). > > So the one thing _restore will have to update is .last. > I think it can either do what it does now and set .last > to the current hw counter value + update the timestamp > to match, or it could perhaps adjust the stored .last > such that the already stored timestamp and the updated > .last match up. But I think both of those options have > the same level or inaccuracy since both would still do > the same ts_diff->hw_counter_diff prediction. > > > > > It does complicate the code a bit, because we'd need to store the > > count/timestamp information from _restore outside of the usual vblank ts > > array. But I think that addresses everything. > > Hmm. So restore would store this extra information > somewhere else, and not update the normal stuff at all? > What exactly would we do with that extra data? Hm I guess I didn't think this through. But the idea I had was: - _restore always recomputes back from the las drm_crtc_handl_vblank-stored timestamp. - the first drm_crtc_handle_vblank bakes in any corrections that _restore has prepared meanwhile - same applies to all the sampling functions we might look at lastes timestamps/counter values. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx