Re: [PATCH] drm/mgag200: Flush the cache to improve latency

Dave Airlie <airlied@xxxxxxxxx> · Mon, 5 Feb 2024 17:02:58 +1000

On Mon, 6 Nov 2023 at 20:47, Jocelyn Falempe <jfalempe@xxxxxxxxxx> wrote:
>
> On 23/10/2023 10:30, Jocelyn Falempe wrote:
> > On 20/10/2023 14:06, Thomas Zimmermann wrote:
> >> (cc'ing lkml for feedback)
> >>
> >> Hi Jocelyn
> >>
> >> Am 19.10.23 um 15:55 schrieb Jocelyn Falempe:
> >>> We found a regression in v5.10 on real-time server, using the
> >>> rt-kernel and the mgag200 driver. It's some really specialized
> >>> workload, with <10us latency expectation on isolated core.
> >>> After the v5.10, the real time tasks missed their <10us latency
> >>> when something prints on the screen (fbcon or printk)
> >>
> >> I'd like to hear the opinion of the RT-devs on this patch. Because
> >> AFAIK we never did such a workaround in other drivers. And AFAIK
> >> printk is a PITA anyway.
> >
> > Most other drivers uses DMA, which means this workaround can't apply to
> > them.
> >
> >>
> >> IMHO if that RT system cannot handle differences in framebuffer
> >> caching, it's under-powered. It's just a matter of time until
> >> something else changes and the problem returns. And (honest question)
> >> as it's an x86-64, how do they handle System Management Mode?
> >
> > I think it's not a big news, that the Matrox G200 from 1999 is
> > under-powered.
> > I was also a bit surprised that flushing the cache would have such
> > effect on latency. The tests we are doing can run 24h with the
> > workaround, without any interrupt taking more than 10us. Without the
> > workaround, every ~30s the interrupt failed its 10us target.
> >
> >>
> >>>
> >>> The regression has been bisected to 2 commits:
> >>> 0b34d58b6c32 ("drm/mgag200: Enable caching for SHMEM pages")
> >>> 4862ffaec523 ("drm/mgag200: Move vmap out of commit tail")
> >>>
> >>> The first one changed the system memory framebuffer from Write-Combine
> >>> to the default caching.
> >>> Before the second commit, the mgag200 driver used to unmap the
> >>> framebuffer after each frame, which implicitly does a cache flush.
> >>> Both regressions are fixed by the following patch, which forces a
> >>> cache flush after each frame, reverting to almost v5.9 behavior.
> >>
> >> With that second commit, we essentially never unmap an active
> >> framebuffer console. But with commit
> >>
> >> 359c6649cd9a ("drm/gem: Implement shadow-plane {begin, end}_fb_access
> >> with vmap")
> >>
> >> we now again unmap the console framebuffer after the pageflip happened.
> >>
> >> So how does the latest kernel behave wrt to the problem?
> >
> > The regression was found when upgrading the server from v5.4 to v5.14,
> > so we didn't test with later kernels.
> > We will test with v6.3 (which should have 359c6649cd9a ) and see what it
> > gives.
>
> I don't have a clear explanation, but testing with v6.3, and forcing the
> Write Combine, doesn't fix the latency issue. So forcing the cache flush
> is still needed.
>
> Also, on some systems, they use "isolated cpu" to handle RT task, but
> with a standard kernel (so without the CONFIG_PREEMPT_RT).
> So I'm wondering if we can use a kernel module parameter for this,
> so that users that wants to achieve low latency, can opt-in ?
>
> something like mgag200.force_cache_flush=1 or mgag200.low_latency=1 ?

I think we should either add a config option or command line parameter here.

I'd don't think adding nopat to the kernel command line is a good
suggestion in the long run, servers often have other cards plugged
into them like nvidia gpus or rdma etc, you don't want to cripple them
because you want reduced latency on the crappy on-board.

I'd rather we put the default back to what it used to be, which was
flush the cache though, I'm not sure why we have any objection to
doing that, it used to work, it was clearly fine in operation, why
undo it?

Dave.