Re: [PATCH] i915/gt/selftest_lrc: Remove timestamp test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

> Quoting Tvrtko Ursulin (2025-03-04 16:43:45)
> > 
> > On 04/03/2025 13:09, Mikolaj Wasiak wrote:
> > > This test exposes bug in tigerlake hardware which prevents it from
> > > succeeding. Since the tested feature is only available on bugged hardware
> > > and we won't support any new hardware, this test is obsolete and
> > > should be removed.
> > 
> > I randomly clicked on one TGL, one DG2, one MTL and one RKL in the CI 
> > and only saw test passes. Then I looked at the patch below to see if 
> > there is a skip condition but don't see one. So I end up confused since 
> > commit message is making it sound like this only exists on Tigerlake and 
> > it's failing all the time. Is it perhaps a sporadic failure? On all 
> > platforms or just TGL? What am I missing?
> 
> The HW issue affects all gen12 platforms currently supported by i915. I
> don't have any data for derivatives, so I cannot confirm if this bug was
> fixed. The lrc_timestamp test was written to demonstrate this HW bug, to
> isolate it from (and explain) the pphwsp runtime discrepancies, covered
> by another selftest. The question is whether we want to keep a selftest
> that is expected to sporadically fail, that exists purely to hunt for
> those failures.
> 
> In the past, we have kept such selftests, but hidden them behind
> !IS_ENABLED(CONFIG_DRM_I915_SELFTEST_BROKEN).
> 
> So,
> - keep the selftest and expect sporadic failures in BAT, or
We cannot rely on such test that "sometimes" fails. If we cannot
ensure it works properly and provides predictable results in our
environment then we should not run it, I believe. Furthermore,
this may cause new bug reports to be filled for the same issue
over and over again in the future.

> - remove the selftest and completely forget about the HW issue, or
Can we do anything about that HW issue? :)

> - hide the selftest and stop it running on known bad platforms?
This seems like it could be a solution here. I have a question
though: would that render the test hidden behind that setting
unused in CI? Or is it a similar situation to "FIXME" notes that
tell us that somebody is aware of the issue, but could not
address it at the time?

Best Regards,
Krzysztof




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux