Quoting Greg KH (2020-09-16 07:33:58) > On Tue, Sep 15, 2020 at 01:41:48PM +0100, Chris Wilson wrote: > > On Tigerlake, we are seeing a repeat of commit d8f505311717 ("drm/i915/icl: > > Forcibly evict stale csb entries") where, presumably, due to a missing > > Global Observation Point synchronisation, the write pointer of the CSB > > ringbuffer is updated _prior_ to the contents of the ringbuffer. That is > > we see the GPU report more context-switch entries for us to parse, but > > those entries have not been written, leading us to process stale events, > > and eventually report a hung GPU. > > > > However, this effect appears to be much more severe than we previously > > saw on Icelake (though it might be best if we try the same approach > > there as well and measure), and Bruce suggested the good idea of resetting > > the CSB entry after use so that we can detect when it has been updated by > > the GPU. By instrumenting how long that may be, we can set a reliable > > upper bound for how long we should wait for: > > > > 513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns) > > > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045 > > References: d8f505311717 ("drm/i915/icl: Forcibly evict stale csb entries") > > What does "References:" mean? Should that be "Fixes:"? It's a reference to an earlier w/a for a previous generation for the same symptoms. This patch should supplement that w/a. -Chris