On Mon, Feb 05, 2024 at 05:02:58PM +1000, Dave Airlie wrote: > On Mon, 6 Nov 2023 at 20:47, Jocelyn Falempe <jfalempe@xxxxxxxxxx> wrote: > > > > On 23/10/2023 10:30, Jocelyn Falempe wrote: > > > On 20/10/2023 14:06, Thomas Zimmermann wrote: > > >> (cc'ing lkml for feedback) > > >> > > >> Hi Jocelyn > > >> > > >> Am 19.10.23 um 15:55 schrieb Jocelyn Falempe: > > >>> We found a regression in v5.10 on real-time server, using the > > >>> rt-kernel and the mgag200 driver. It's some really specialized > > >>> workload, with <10us latency expectation on isolated core. > > >>> After the v5.10, the real time tasks missed their <10us latency > > >>> when something prints on the screen (fbcon or printk) > > >> > > >> I'd like to hear the opinion of the RT-devs on this patch. Because > > >> AFAIK we never did such a workaround in other drivers. And AFAIK > > >> printk is a PITA anyway. > > > > > > Most other drivers uses DMA, which means this workaround can't apply to > > > them. > > > > > >> > > >> IMHO if that RT system cannot handle differences in framebuffer > > >> caching, it's under-powered. It's just a matter of time until > > >> something else changes and the problem returns. And (honest question) > > >> as it's an x86-64, how do they handle System Management Mode? > > > > > > I think it's not a big news, that the Matrox G200 from 1999 is > > > under-powered. > > > I was also a bit surprised that flushing the cache would have such > > > effect on latency. The tests we are doing can run 24h with the > > > workaround, without any interrupt taking more than 10us. Without the > > > workaround, every ~30s the interrupt failed its 10us target. > > > > > >> > > >>> > > >>> The regression has been bisected to 2 commits: > > >>> 0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages") > > >>> 4862ffaec523 ("drm/mgag200: Move vmap out of commit tail") > > >>> > > >>> The first one changed the system memory framebuffer from Write-Combine > > >>> to the default caching. > > >>> Before the second commit, the mgag200 driver used to unmap the > > >>> framebuffer after each frame, which implicitly does a cache flush. > > >>> Both regressions are fixed by the following patch, which forces a > > >>> cache flush after each frame, reverting to almost v5.9 behavior. > > >> > > >> With that second commit, we essentially never unmap an active > > >> framebuffer console. But with commit > > >> > > >> 359c6649cd9a ("drm/gem: Implement shadow-plane {begin, end}_fb_access > > >> with vmap") > > >> > > >> we now again unmap the console framebuffer after the pageflip happened. > > >> > > >> So how does the latest kernel behave wrt to the problem? > > > > > > The regression was found when upgrading the server from v5.4 to v5.14, > > > so we didn't test with later kernels. > > > We will test with v6.3 (which should have 359c6649cd9a ) and see what it > > > gives. > > > > I don't have a clear explanation, but testing with v6.3, and forcing the > > Write Combine, doesn't fix the latency issue. So forcing the cache flush > > is still needed. > > > > Also, on some systems, they use "isolated cpu" to handle RT task, but > > with a standard kernel (so without the CONFIG_PREEMPT_RT). > > So I'm wondering if we can use a kernel module parameter for this, > > so that users that wants to achieve low latency, can opt-in ? > > > > something like mgag200.force_cache_flush=1 or mgag200.low_latency=1 ? > > I think we should either add a config option or command line parameter here. Yeah I think the situation is cursed enough that we need that, and module option for mgag200 sounds like the most reasonable way. > I'd don't think adding nopat to the kernel command line is a good > suggestion in the long run, servers often have other cards plugged > into them like nvidia gpus or rdma etc, you don't want to cripple them > because you want reduced latency on the crappy on-board. > > I'd rather we put the default back to what it used to be, which was > flush the cache though, I'm not sure why we have any objection to > doing that, it used to work, it was clearly fine in operation, why > undo it? Uh that default is a few patches back and I think it's not great if we change that, since it means all drivers using shadow buffers for fdbev will again suffer from rendering fbcon into a wc instead of wb buffer. But Jocelyn has meanwhile dug out more info in another reply, I'll reply there. Cheers, Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch