On 21/06/2022 20:11, Robert Beckett wrote:
On 21/06/2022 18:37, Patchwork wrote:
*Patch Details*
*Series:* drm/i915: ttm for stolen (rev5)
*URL:* https://patchwork.freedesktop.org/series/101396/
<https://patchwork.freedesktop.org/series/101396/>
*State:* failure
*Details:*
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html>
CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5
Summary
*FAILURE*
Serious unknown changes coming with Patchwork_101396v5 absolutely need
to be
verified manually.
If you think the reported changes have nothing to do with the changes
introduced in Patchwork_101396v5, please notify your bug team to allow
them
to document this new failure mode, which will reduce false positives
in CI.
External URL:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
Participating hosts (40 -> 41)
Additional (2): fi-icl-u2 bat-dg2-9
Missing (1): fi-bdw-samus
Possible new issues
Here are the unknown changes that may have been introduced in
Patchwork_101396v5:
IGT changes
Possible regressions
* igt@i915_selftest@live@reset:
o bat-adlp-4: PASS
<https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@xxxxxxxxxx>
-> DMESG-FAIL
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@xxxxxxxxxx>
I keep hitting clobbered pages during engine resets on bat-adlp-4.
It seems to happen most of the time on that machine and occasionally on
bat-adlp-6.
Should bat-adlp-4 be considered an unreliable machine like bat-adlp-6 is
for now?
Alternatively, seeing the history of this in
commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude low
pages (128KiB) of stolen from use"
could this be an indication that maybe the original issue is worse on
adlp machines?
I have only ever seen page page 135 or 136 clobbered across many runs
via trybot, so it looks fairly consistent.
Though excluding the use of over 540K of stolen might be too severe.
Don't know but I see that on the latest version you even hit pages 165/166.
Any history of hitting this in CI without your series? If not, are there
some other changes which could explain it? Are you touching the selftest
itself?
Hexdump of the clobbered page looks quite complex. Especially
POISON_FREE. Any idea how that ends up there?
Btw what is the benefit of converting stolen to start with? It's not
much of a backend since it just uses the drm range manager. So quite
thin and uneventful. Diffstats for the series also do not look like you
end up with much code reduction?
Regards,
Tvrtko