On Tue, Sep 15, 2020 at 01:41:48PM +0100, Chris Wilson wrote: > On Tigerlake, we are seeing a repeat of commit d8f505311717 ("drm/i915/icl: > Forcibly evict stale csb entries") where, presumably, due to a missing > Global Observation Point synchronisation, the write pointer of the CSB > ringbuffer is updated _prior_ to the contents of the ringbuffer. That is > we see the GPU report more context-switch entries for us to parse, but > those entries have not been written, leading us to process stale events, > and eventually report a hung GPU. > > However, this effect appears to be much more severe than we previously > saw on Icelake (though it might be best if we try the same approach > there as well and measure), and Bruce suggested the good idea of resetting > the CSB entry after use so that we can detect when it has been updated by > the GPU. By instrumenting how long that may be, we can set a reliable > upper bound for how long we should wait for: > > 513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns) > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045 > References: d8f505311717 ("drm/i915/icl: Forcibly evict stale csb entries") What does "References:" mean? Should that be "Fixes:"? thanks, greg k-h