Hi, at Intel we have a weekly bug scrub effort, where a dedicated group is responsible for tracking regressions, triaging new bugs, fixing bugs and keeping bugs up to date. At the end of the week we'll post a summary of this effort, following is the first such: In general: ----------- - Bug#55984: We haven't found the root cause for this, we spent most of the time to assist Chris and Daniel to try different candidate fixes and bisect things. How easy it was to reproduce it was influenced a lot by the environment, SNA vs. UXA, compiz/metacity vs. no composition seemed to affect it. A further complication is that we hit the GPU hang due to different reasons. >From Chris' comments I understand the search still continues, though one set of the reports should be RC6 related and thus fixed by disabling RC6 on ILK. One observation is that we are fighting bugs somewhat opportunistically not aiming at finding the root cause of the problem. The reasons/solutions for this as we see it: - Lack of time 1 week is a short time and the general expectation is to solve symptoms ASAP w/o "wasting" a lot of time to understand the problem better. Proposed solution: better appreciation of finding the underlying issues, with more time allocated for this. - Lack of tools With the existing tools (apitrace, drm error status, libdrm aub dumps) we can't fight certain bugs related to the kernel driver/HW like Bug#55984. Proposed solution: a new tool tracing bo contents, exec buf and other relevant IOCTLs from the kernel driver, to produce a replayable trace. Initial investigation started on this and it looks doable, would be great to get some input about its feasibility / usefulness from Chris and Daniel or other people on the list. On the positive side this week was a great opportunity for us to learn a lot about the inner workings of the driver/HW. In detail: ---------- Ville: - Bug#54911: Tracked this down to invalid EDID handling and is planning to revisit it once Egbert Eich's EDID patchset settles down. - A new bug found where the GPU ring tail ptr wraps around and gets within a cacheline distance from the head. According to the spec this results in undefined behavior, a patch will be sent to fix this. - New bug(s) will be opened for GPU hangs that are most probably not related to Bug#55984. Mika: - From the internal bugzilla bugs older than 3 month are set to wontfix, most of which eventually closed by the reporter. There are still a number of these left to go through. - Started to work on a script to search and retrieve i915_error_state attachments from each. This will allow us for example to find similarities between bugs and mark them as duplicates. Imre: - Nothing else besides assisting Chris and Daniel with reproducing #55984 and bisecting it to a particular commit. But this commit is probably not the root cause, simply disabling RC6 got rid of the problem for me. --Imre