On Fri, 8 Sep 2023 15:56:51 +0200 Thomas Zimmermann <tzimmermann@xxxxxxx> wrote: > Hi > > Am 08.09.23 um 13:16 schrieb Pekka Paalanen: > > On Fri, 8 Sep 2023 11:21:51 +0200 > > Thomas Zimmermann <tzimmermann@xxxxxxx> wrote: > > > >> Hi > >> > >> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: > >> [...] > >>> + * > >>> + * But there are two exceptions only for dumb buffers: > >>> + * * To support XRGB8888 if it's not supported by the hardware. > >> > >> > >>> + * * Any driver is free to modify its internal representation of the format, > >>> + * as long as it doesn't alter the visible content in any way, and doesn't > >>> + * modify the user-provided buffer. An example would be to drop the > >>> + * padding component from a format to save some memory bandwidth. > >> > >> I have strong objections to this point, _especially_ as you're > >> apparently trying to sneak this in after our discussion. NAK on this > >> part from my side. > >> > >> If you want userspace to be able to use a certain format, then export > >> the corresponding 4cc code. Then let userspace decide what to do about > >> it. If userspace pick a certain format, go with it. > > > > What is the reason for your objection, exactly? > > > >> Hence, no implicit conversion from XRGB888 to RGB888, just because it's > >> possible. > > > > For the particular driver in question though, the conversion allows > > using a display resolution that is otherwise not possible. I also hear > > it improves performance since 25% less data needs to travel across a > > slow bus. There is also so little VRAM, than all dumb buffers need to > > be allocated from sysram instead anyway, so a copy is always necessary. > > > > Since XRGB8888 is the one format that is recommended to be supported by > > all drivers, I don't see a problem here. Did you test on your > > incredibly slow g200 test rig if the conversion ends up hurting instead > > of helping performance there? > > > > If it hurts, then I see that you have a good reason to NAK this. > > > > It's hard to imagine how it would hurt, since you always need a copy > > from sysram dumb buffers to VRAM - or do you? > > I have a number of concerns. My point it not that we shouldn't optimize. > I just don't want it in the kernel. Mgag200 can export DRM_FORMAT_RGB888 > for userspace to use. > > AFAICT the main argument against userspace is that Mesa doesn't like > 3-byte pixels. But I don't see how this conversion cannot be a > post-processing step within Mesa: do the rendering in RGB32 and then > convert to a framebuffer in RGB24. Userspace can do that more > efficiently than the kernel. This has all of the upsides of reduced > bandwidth, but none of the downsides of kernel code. Applications and/or > Mesa would be in control of the buffer format and apply the optimization > where it makes sense. And it would be available for all drivers that are > similar to mgag200. > > My main point is simplicity of the driver: I prefer the driver to be > simple without unnecessary indirection or overhead. Optimizations like > these my or may not work on a given system with a certain workload. I'd > better leave this heuristic to userspace. > > Another point of concern is CPU consumption: Slow I/O buses may stall > the display thread, but the CPU could do something else in the meantime. > Doing format conversion on the CPU prevents that, hence affecting other > parts of the system negatively. Of course, that's more of a gut feeling > than hard data. > > Please note that the kernel's conversion code uses memory allocation of > intermediate buffers. We even recently had a discussion about allocation > overhead during display updates. Userspace can surely do a better job at > keeping such buffers around. > > And finally a note the hardware itself: on low-end hardware like those > Matrox chips, just switch to RGB16. That will be pretty and fast enough > for these chips' server systems. Anyone who cares about fast and > beautiful should buy a real graphics card. Fair enough. Did you consider that every frame will be copied twice: once in userspace to get RGB888, then again by the driver into VRAM, since dumb buffers reside in sysram? RGB565 would probably go the same route I guess. I suspect the intermediate buffers (dumb buffers in this case) need to be full frame size rather than a scanline, too. I'm not sure why the driver would need any scratch buffers beyond a couple dozen bytes while doing a software conversion, just enough to have the lowest common multiple of 3 bytes and optimal bus write width. Thanks, pq > > Best regards > Thomas > > > > > > > Thanks, > > pq > > > >>> + * On most hardware, VRAM read access are slow, so when doing the software > >>> + * conversion, the dumb buffer should be allocated in system RAM in order to > >>> + * have decent performance. > >>> + * Extra care should be taken when doing software conversion with > >>> + * DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed explanations here: > >>> + * https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/ > >>> */ > >>> > >>> static unsigned int drm_num_planes(struct drm_device *dev) > >>> > >>> base-commit: 82d750e9d2f5d0594c8f7057ce59127e701af781 > >> > > >
Attachment:
pgpMCwdjApJHO.pgp
Description: OpenPGP digital signature