On Mon, Feb 08, 2021 at 06:43:53PM +0100, Daniel Vetter wrote: > On Mon, Feb 8, 2021 at 5:58 PM Ville Syrjälä > <ville.syrjala@xxxxxxxxxxxxxxx> wrote: > > > > On Mon, Feb 08, 2021 at 10:56:36AM +0100, Daniel Vetter wrote: > > > On Fri, Feb 05, 2021 at 11:19:19PM +0200, Ville Syrjälä wrote: > > > > On Fri, Feb 05, 2021 at 06:24:08PM +0200, Ville Syrjälä wrote: > > > > > On Fri, Feb 05, 2021 at 04:46:27PM +0100, Daniel Vetter wrote: > > > > > > On Thu, Feb 04, 2021 at 05:55:28PM +0200, Ville Syrjälä wrote: > > > > > > > On Thu, Feb 04, 2021 at 04:32:16PM +0100, Daniel Vetter wrote: > > > > > > > > On Thu, Feb 04, 2021 at 04:04:00AM +0200, Ville Syrjala wrote: > > > > > > > > > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > > > > > > > > > > > > > > > > > drm_vblank_restore() exists because certain power saving states > > > > > > > > > can clobber the hardware frame counter. The way it does this is > > > > > > > > > by guesstimating how many frames were missed purely based on > > > > > > > > > the difference between the last stored timestamp vs. a newly > > > > > > > > > sampled timestamp. > > > > > > > > > > > > > > > > > > If we should call this function before a full frame has > > > > > > > > > elapsed since we sampled the last timestamp we would end up > > > > > > > > > with a possibly slightly different timestamp value for the > > > > > > > > > same frame. Currently we will happily overwrite the already > > > > > > > > > stored timestamp for the frame with the new value. This > > > > > > > > > could cause userspace to observe two different timestamps > > > > > > > > > for the same frame (and the timestamp could even go > > > > > > > > > backwards depending on how much error we introduce when > > > > > > > > > correcting the timestamp based on the scanout position). > > > > > > > > > > > > > > > > > > To avoid that let's not update the stored timestamp unless we're > > > > > > > > > also incrementing the sequence counter. We do still want to update > > > > > > > > > vblank->last with the freshly sampled hw frame counter value so > > > > > > > > > that subsequent vblank irqs/queries can actually use the hw frame > > > > > > > > > counter to determine how many frames have elapsed. > > > > > > > > > > > > > > > > Hm I'm not getting the reason for why we store the updated hw vblank > > > > > > > > counter? > > > > > > > > > > > > > > Because next time a vblank irq happens the code will do: > > > > > > > diff = current_hw_counter - vblank->last > > > > > > > > > > > > > > which won't work very well if vblank->last is garbage. > > > > > > > > > > > > > > Updating vblank->last is pretty much why drm_vblank_restore() > > > > > > > exists at all. > > > > > > > > > > > > Oh sure, _restore has to update this, together with the timestamp. > > > > > > > > > > > > But your code adds such an update where we update the hw vblank counter, > > > > > > but not the timestamp, and that feels buggy. Either we're still in the > > > > > > same frame, and then we should story nothing. Or we advanced, and then we > > > > > > probably want a new timestampt for that frame too. > > > > > > > > > > Even if we're still in the same frame the hw frame counter may already > > > > > have been reset due to the power well having been turned off. That is > > > > > what I'm trying to fix here. > > > > > > > > > > Now I suppose that's fairly unlikely, at least with PSR which probably > > > > > does impose some extra delays before the power gets yanked. But at least > > > > > theoretically possible. > > > > > > > > Pondering about this a bit further. I think the fact that the current > > > > code takes the round-to-closest approach I used for the vblank handler > > > > is perhaps a bit bad. It could push the seq counter forward if we're > > > > past the halfway point of a frame. I think that rounding behaviour > > > > makes sense for the irq since those tick steadily and so allowing a bit > > > > of error either way seems correct to me. Perhaps round-down might be > > > > the better option for _restore(). Not quites sure, need more thinking > > > > probably. > > > > > > Yes this is the rounding I'm worried about. > > > > Actually I don't think this is really an issue since we are working > > with the corrected timestamps here. Those always line up with > > frames, so unless the correction is really buggy or the hw somehow > > skips a partial frame it should work rather well. At least when > > operating with small timescales. For large gaps the error might > > creep up, but I don't think a small error in the predicted seq > > number over a long timespan is really a problem. > > That corrected timestamp is what can go wrong I think: There's no > guarantee that drm_crtc_vblank_helper_get_vblank_timestamp_internal() > flips to top-of-frame at the exact same time than when the hw vblank > counter flips. Or at least I'm not seeing where we correct them both > together. We do this seqlock type of thing: do { cur_vblank = __get_vblank_counter(dev, pipe); rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, in_vblank_irq); } while (cur_vblank != __get_vblank_counter(dev, pipe) && --count > 0); which guarantees the timestamp really is for the frame we think it is for. > > > > But your point above that the hw might reset the counter again is also > > > valid. I'm assuming what you're worried about is that we first do a > > > _restore (and the hw vblank counter hasn't been trashed yet), and then in > > > the same frame we do another restore, but now the hw frame counter has > > > been trashe, and we need to update it? > > > > Yeah, although the pre-trashing _restore could also just be > > a vblank irq I think. > > > > > > > > > Another idea that came to me now is that maybe we should actually just > > > > check if the current hw frame counter value looks sane, as in something > > > > like: > > > > > > > > diff_hw_counter = current_hw_counter-stored_hw_counter > > > > diff_ts = (current_ts-stored_ts)/framedur > > > > > > > > if (diff_hw_counter ~= diff_ts) > > > > diff = diff_hw_counter; > > > > else > > > > diff = diff_ts; > > > > > > > > and if they seem to match then just keep trusting the hw counter. > > > > So only if there's a significant difference would we disregard > > > > the diff of the hw counter and instead use the diff based on the > > > > timestamps. Not sure what "significant" is though; One frame, two > > > > frames? > > > > > > Hm, another idea: The only point where we can trust the entire hw counter > > > + timestamp sampling is when the irq happens. Because then we know the > > > driver will have properly corrected for any hw oddities (like hw counter > > > flipping not at top-of-frame, like the core expects). > > > > i915 at least gives out correct data regardless of when you sample > > it. Well, except for the cases where the hw counter gets trashed, > > in which case the hw counter is garbage (when compared with .last) > > but the timestamp is still correct. > > Hm where/how do we handle this? Maybe I'm just out of date with how it > all works nowadays. There's not much to handle. We know when exactly the counters increment and thus can give out the correct answer to the question "which frame is this?". > > > > So what if _restore always goes back to the last such trusted hw counter > > > for computing the frame counter diff and all that stuff? That way if we > > > have a bunch of _restore with incosisten hw vblank counter, we will a) > > > only take the last one (fixes the bug you're trying to fix) b) still use > > > the same last trusted baseline for computations (addresses the race I'm > > > seeing). > > > > > > Or does this not work? > > > > I don't think I really understand what you're suggesting here. > > _restore is already using the last trusted data (the stored > > timestamp + .last). > > > > So the one thing _restore will have to update is .last. > > I think it can either do what it does now and set .last > > to the current hw counter value + update the timestamp > > to match, or it could perhaps adjust the stored .last > > such that the already stored timestamp and the updated > > .last match up. But I think both of those options have > > the same level or inaccuracy since both would still do > > the same ts_diff->hw_counter_diff prediction. > > > > > > > > It does complicate the code a bit, because we'd need to store the > > > count/timestamp information from _restore outside of the usual vblank ts > > > array. But I think that addresses everything. > > > > Hmm. So restore would store this extra information > > somewhere else, and not update the normal stuff at all? > > What exactly would we do with that extra data? > > Hm I guess I didn't think this through. But the idea I had was: > - _restore always recomputes back from the las > drm_crtc_handl_vblank-stored timestamp. > - the first drm_crtc_handle_vblank bakes in any corrections that > _restore has prepared meanwhile > - same applies to all the sampling functions we might look at lastes > timestamps/counter values. So I guess instead of _restore adjusting .last we would instead mainatian a separate correction information and apply it when doing the diff between the current hw counter vs. .last. Not sure why that would be particularly better than just adjusting .last directly. -- Ville Syrjälä Intel _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel