Re: [PATCH] drm/ci: add kms_cursor_legacy@torture-bo to apq8016 flakes

Abhinav Kumar <quic_abhinavk@xxxxxxxxxxx> · Sun, 15 Dec 2024 22:09:46 -0800

On 12/15/2024 9:45 PM, Vignesh Raman wrote:
Hi Abhinav,

On 14/12/24 01:09, Abhinav Kumar wrote:
Hi Vignesh

On 12/11/2024 9:10 PM, Vignesh Raman wrote:
Hi Abhinav / Helen,

On 12/12/24 01:48, Abhinav Kumar wrote:
Hi Helen / Vignesh

On 12/4/2024 12:33 PM, Helen Mae Koike Fornazier wrote:

---- On Wed, 04 Dec 2024 16:21:26 -0300 Abhinav Kumar  wrote ---

  > Hi Helen
  >
  > On 12/4/2024 11:14 AM, Helen Mae Koike Fornazier wrote:
  > > Hi Abhinav,
  > >
  > > Thanks for your patch.
  > >
  > >
  > >
  > > ---- On Wed, 04 Dec 2024 15:55:17 -0300 Abhinav Kumar  wrote ---
  > >
  > >   > From the jobs [1] and [2] of pipeline [3], its clear that
  > >   > kms_cursor_legacy@torture-bo is most certainly a flake and
  > >   > not a fail for apq8016. Mark the test accordingly to 
match the results.
  > >   >
  > >   > [1] : https://gitlab.freedesktop.org/drm/msm/-/jobs/67676481

The test passes - 
kms_cursor_legacy@torture-bo,UnexpectedImprovement(Pass)

Yes, thats the problem

https://gitlab.freedesktop.org/drm/msm/-/jobs/67676481/viewer#L2696

24-12-04 03:51:55 R SERIAL> [  179.241309] [IGT] kms_cursor_legacy: 
finished subtest all-pipes, SUCCESS
24-12-04 03:51:55 R SERIAL> [  179.241812] [IGT] kms_cursor_legacy: 
finished subtest torture-bo, SUCCESS

Here it passes whereas it was marked a failure. Hence pipeline fails.

Yes it fails due to,

Unexpected results:
  kms_cursor_legacy@torture-bo,UnexpectedImprovement(Pass)

In this case, we need to remove this test from fails.txt

  > >   > [2] : https://gitlab.freedesktop.org/drm/msm/-/jobs/67677430

There are no test failures

No, thats not true

https://gitlab.freedesktop.org/drm/msm/-/jobs/67677430/viewer#L2694

24-12-04 04:18:38 R SERIAL> [  170.379649] Console: switching to 
colour dummy device 80x25
24-12-04 04:18:38 R SERIAL> [  170.379938] [IGT] kms_cursor_legacy: 
executing
24-12-04 04:18:38 R SERIAL> [  170.393868] [IGT] kms_cursor_legacy: 
starting subtest torture-bo
24-12-04 04:18:38 R SERIAL> [  170.394186] [IGT] kms_cursor_legacy: 
starting dynamic subtest pipe-A
24-12-04 04:18:38 R SERIAL> [  170.661749] [IGT] kms_cursor_legacy: 
finished subtest pipe-A, FAIL
24-12-04 04:18:38 R SERIAL> [  170.662060] [IGT] kms_cursor_legacy: 
starting dynamic subtest all-pipes
24-12-04 04:18:38 R SERIAL> [  170.713237] [IGT] kms_cursor_legacy: 
finished subtest all-pipes, FAIL
24-12-04 04:18:38 R SERIAL> [  170.713513] [IGT] kms_cursor_legacy: 
finished subtest torture-bo, FAIL
24-12-04 04:18:38 R SERIAL> [  170.721263] [IGT] kms_cursor_legacy: 
exiting, ret=98
24-12-04 04:18:38 R SERIAL> [  170.737857] Console: switching to 
colour frame buffer device 128x48

Please check these logs, the torture-bo test-case did fail. The 
pipeline was marked pass because it was an expected fail.

So we have two pipelines, where one failed and the other passed. So 
thats a flake for me.

Yes agree. So if we had removed the test from fails, deqp-runner would 
have reported this as flake.

deqp-runner runs the test and if it fails, it retries. If the test 
passes on retry, it is reported as a flake.

  > >   > [3]: 
https://gitlab.freedesktop.org/drm/msm/-/pipelines/1322770

The job is same as 2

In this case, the test passes and deqp-runner does not report it as 
flake. So we only need to remove it from fails file.

No, like I mentioned above we have a pass and a fail.

  > >   >
  > >   > Signed-off-by: Abhinav Kumar quic_abhinavk@xxxxxxxxxxx>
  > >   > ---
  > >   >  drivers/gpu/drm/ci/xfails/msm-apq8016-flakes.txt | 5 +++++
  > >   >  1 file changed, 5 insertions(+)
  > >   >
  > >   > diff --git 
a/drivers/gpu/drm/ci/xfails/msm-apq8016-flakes.txt 
b/drivers/gpu/drm/ci/xfails/msm-apq8016-flakes.txt
  > >   > new file mode 100644
  > >   > index 000000000000..18639853f18f
  > >   > --- /dev/null
  > >   > +++ b/drivers/gpu/drm/ci/xfails/msm-apq8016-flakes.txt
  > >   > @@ -0,0 +1,5 @@
  > >   > +# Board Name: msm-apq8016-db410c
  > >   > +# Failure Rate: 100
  > >
  > > Is failure rate is 100%, isn't it a fail than?
  > > (I know we have other cases with Failure Rate: 100, maybe we 
should fix them as well)
  > >
  >
  > Maybe I misunderstood the meaning of "Failure rate" for a flake.
  >
  > I interpreted this as this test being flaky 100% of the time :)

Ah right, I see, inside deqp-runner (that auto-retries).

I'd like to hear Vignesh's opinion on this.

(In any case, we probably should document this better)

deqp-runner reports new (not present in flakes file) or known 
(present in flakes file) flakes

2024-12-11 07:25:44.709666: Some new flakes found:
2024-12-11 07:25:44.709676:   kms_lease@page-flip-implicit-plane

2024-12-11 13:15:16.482890: Some known flakes found:
2024-12-11 13:15:16.482898: 
kms_async_flips@async-flip-with-page-flip-events-atomic

we add it to flakes file if deqp runner reports new flakes. Another 
case where we update flake tests is when a test passes in one run but 
fails in another, but deqp-runner does not report it as flake.

Regards,
Vignesh

The confusion here i guess is about what to mention as a "Failure rate"

Failure rate means how many times it fails (like normally) ? In that 
case 100% which I used is wrong and I used 33% instead for which I 
have pushed v2.

Yes, 33% is correct and please remove this test from fails.txt

Regards,
Vignesh

Ack, let me remove this test from fails and keep it only in flakes.

Thanks

Abhinav

Regards,
Helen

Can you let me know which way we need to go?

Just in case I did post a v2 fixing this, 
https://patchwork.freedesktop.org/patch/627276/

If thats the way to go, can you pls take a look?

Thanks

Abhinav
  >
  > Out of the 3 runs of the test, it passed 2/3 times and failed 1/3.
  >
  > So its fail % actually is 33.33% in that case.
  >
  > I think I saw a Failure rate of 100% on 
msm-sm8350-hdk-flakes.txt and
  > mistook that as the rate at which flakes are seen.
  >
  > Let me fix this up as 33%
  >
  > > Regards,
  > > Helen
  > >
  > >   > +# IGT Version: 1.28-ga73311079
  > >   > +# Linux Version: 6.12.0-rc2
  > >   > +kms_cursor_legacy@torture-bo
  > >   >
  > >   > ---
  > >   > base-commit: 798bb342e0416d846cf67f4725a3428f39bfb96b
  > >   > change-id: 20241204-cursor_tor_skip-9d128dd62c4f
  > >   >
  > >   > Best regards,
  > >   > --
  > >   > Abhinav Kumar quic_abhinavk@xxxxxxxxxxx>
  > >   >
  > >   >
  > >
  >