Re: [PATCH 2/2] drm/vkms: Use a simpler composition function

Louis Chauvet <louis.chauvet@xxxxxxxxxxx> · Wed, 7 Feb 2024 17:03:26 +0100

Hello Pekka, Arthur,

[...]

> > > Would it be possible to have a standardised benchmark specifically
> > > for performance rather than correctness, in IGT or where-ever it
> > > would make sense? Then it would be simple to tell contributors to
> > > run this and report the numbers before and after.
> > > 
> > > I would propose this kind of KMS layout:
> > > 
> > > - CRTC size 3841 x 2161
> > > - primary plane, XRGB8888, 3639 x 2161 @ 101,0
> > > - overlay A, XBGR2101010, 3033 x 1777 @ 201,199
> > > - overlay B, ARGB8888, 1507 x 1400 @ 1800,250
> > > 
> > > The sizes and positions are deliberately odd to try to avoid happy
> > > alignment accidents. The planes are big, which should let the pixel
> > > operations easily dominate performance measurement. There are
> > > different pixel formats, both opaque and semi-transparent. There is
> > > lots of plane overlap. The planes also do not cover the whole CRTC
> > > leaving the background visible a bit.
> > > 
> > > There should be two FBs per each plane, flipped alternatingly each
> > > frame. Writeback should be active. Run this a number of frames, say,
> > > 100, and measure the kernel CPU time taken. It's supposed to take at
> > > least several seconds in total.
> > > 
> > > I think something like this should be the base benchmark. One can
> > > add more to it, like rotated planes, YUV planes, etc. or switch
> > > settings on the existing planes. Maybe even FB_DAMAGE_CLIPS. Maybe
> > > one more overlay that is very tall and thin.
> > > 
> > > Just an idea, what do you all think?  
> > 
> > Hi Pekka,
> > 
> > I just finished writing this proposal using IGT.
> > 
> > I got pretty interesting results:
> > 
> > The mentioned commit 8356b97906503a02125c8d03c9b88a61ea46a05a took
> > around 13 seconds. While drm-misc/drm-misc-next took 36 seconds.
> > 
> > I'm currently bisecting to be certain that the change to the
> > pixel-by-pixel is the culprit, but I don't see why it wouldn't be.
> > 
> > I just need to do some final touches on the benchmark code and it
> > will be ready for revision.
> 
> Awesome, thank you very much for doing that!
> pq

I also think it's a good benchmarks for classic configurations. The odd 
size is a very nice idea to verify the corner cases of line-by-line 
algorithms.

When this is ready, please share the test, so I can check if my patch is 
as performant as before.

Thank you for this work.

Have a nice day,
Louis Chauvet

-- 
Louis Chauvet, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com