Re: [PATCH v3] drm/plane: Add documentation about software color conversion.

Jocelyn Falempe <jfalempe@xxxxxxxxxx> · Fri, 8 Sep 2023 16:48:59 +0200

On 08/09/2023 15:56, Thomas Zimmermann wrote:
Hi

Am 08.09.23 um 13:16 schrieb Pekka Paalanen:
On Fri, 8 Sep 2023 11:21:51 +0200
Thomas Zimmermann <tzimmermann@xxxxxxx> wrote:

Hi

Am 25.08.23 um 16:04 schrieb Jocelyn Falempe:
[...]
+ *
+ *     But there are two exceptions only for dumb buffers:
+ *     * To support XRGB8888 if it's not supported by the hardware.

+ *     * Any driver is free to modify its internal representation 
of the format,
+ *       as long as it doesn't alter the visible content in any 
way, and doesn't
+ *       modify the user-provided buffer. An example would be to 
drop the
+ *       padding component from a format to save some memory 
bandwidth.

I have strong objections to this point, _especially_ as you're
apparently trying to sneak this in after our discussion. NAK on this
part from my side.

If you want userspace to be able to use a certain format, then export
the corresponding 4cc code. Then let userspace decide what to do about
it. If userspace pick a certain format, go with it.

What is the reason for your objection, exactly?

Hence, no implicit conversion from XRGB888 to RGB888, just because it's
possible.

For the particular driver in question though, the conversion allows
using a display resolution that is otherwise not possible. I also hear
it improves performance since 25% less data needs to travel across a
slow bus. There is also so little VRAM, than all dumb buffers need to
be allocated from sysram instead anyway, so a copy is always necessary.

Since XRGB8888 is the one format that is recommended to be supported by
all drivers, I don't see a problem here. Did you test on your
incredibly slow g200 test rig if the conversion ends up hurting instead
of helping performance there?

If it hurts, then I see that you have a good reason to NAK this.

It's hard to imagine how it would hurt, since you always need a copy
from sysram dumb buffers to VRAM - or do you?

I have a number of concerns. My point it not that we shouldn't optimize. 
I just don't want it in the kernel. Mgag200 can export DRM_FORMAT_RGB888 
for userspace to use.

It already does, it's just userspace doesn't want to support it.

AFAICT the main argument against userspace is that Mesa doesn't like 
3-byte pixels. But I don't see how this conversion cannot be a 
post-processing step within Mesa: do the rendering in RGB32 and then 
convert to a framebuffer in RGB24. Userspace can do that more 
efficiently than the kernel. This has all of the upsides of reduced 
bandwidth, but none of the downsides of kernel code. Applications and/or 
Mesa would be in control of the buffer format and apply the optimization 
where it makes sense. And it would be available for all drivers that are 
similar to mgag200.

For this particular case, user-space would use more memory and CPU, 
because the copy to VRAM is done on kernel side, and it is where the 
conversion must be done for maximum performances. So there is no way for 
userspace to be as efficient as the kernel in this use-case.

For the conversion, the kernel allocate only 1 line, and convert/copy 
one line at a time. And because the CPU is out-of-order, it can start 
converting the next line using CPU registers while waiting for the bus.

Userspace would need to allocate a whole framebuffer, and can't use the 
"wasted" CPU cycle to do the conversion.

My main point is simplicity of the driver: I prefer the driver to be 
simple without unnecessary indirection or overhead. Optimizations like 
these my or may not work on a given system with a certain workload. I'd 
better leave this heuristic to userspace.

I agree that the driver is simpler without optimizing thing. But I think 
it's the role of the driver to get the maximum from the hardware, even 
if it's old and broken like g200. And userspace don't want to optimize 
for such hardware.

Another point of concern is CPU consumption: Slow I/O buses may stall 
the display thread, but the CPU could do something else in the meantime. 
Doing format conversion on the CPU prevents that, hence affecting other 
parts of the system negatively. Of course, that's more of a gut feeling 
than hard data.

I think it's the reverse. Without dropping the X data, the CPU is 
actually stalling much longer sending useless bytes to the mgag200's 
VRAM. And the CPU can't do anything else while doing memcpy_toio().
Using DMA is the only way to free the CPU during the copy, but it 
appears to be either broken or significantly slower on most mgag200 
hardware.

I actually have made the work to support DMA, and I'm a bit sad it 
didn't work on all g200, so this optimization is really a last resort, 
on a really broken hardware.

Please note that the kernel's conversion code uses memory allocation of 
intermediate buffers. We even recently had a discussion about allocation 
overhead during display updates. Userspace can surely do a better job at 
keeping such buffers around.

And finally a note the hardware itself: on low-end hardware like those 
Matrox chips, just switch to RGB16. That will be pretty and fast enough 
for these chips' server systems. Anyone who cares about fast and 
beautiful should buy a real graphics card.

There are still sysadmin users that occasionally have to browse the web 
to find answer, or read their mail in a GUI client. It turns out that 
rgb16 is pretty ugly for today standard, and buying an external card 
would be a bit too much, and won't work for remote control.

Best regards,

--

Jocelyn

Best regards
Thomas

Thanks,
pq

+ *     On most hardware, VRAM read access are slow, so when doing 
the software
+ *     conversion, the dumb buffer should be allocated in system 
RAM in order to
+ *     have decent performance.
+ *     Extra care should be taken when doing software conversion with
+ *     DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed 
explanations here:
+ *     
https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/
    */
   static unsigned int drm_num_planes(struct drm_device *dev)

base-commit: 82d750e9d2f5d0594c8f7057ce59127e701af781