Re: fbdev: Garbage collect fbdev scrolling acceleration

Sven Schnelle <svens@xxxxxxxxxxxxxx> · Wed, 19 Jan 2022 17:33:53 +0100

Hi Daniel,

Daniel Vetter <daniel@xxxxxxxx> writes:

> On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote:
>> Hi Daniel,
>> 
>> Daniel Vetter <daniel@xxxxxxxx> writes:
>> 
>> > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote:
>> >> Helge Deller <deller@xxxxxx> writes:
>> >> > Maybe on fast new x86 boxes the performance difference isn't huge,
>> >> > but for all old systems, or when emulated in qemu, this makes
>> >> > a big difference.
>> >> >
>> >> > Helge
>> >> 
>> >> I second that. For most people, the framebuffer isn't important as
>> >> they're mostly interested in getting to X11/wayland as fast as possible.
>> >> But for systems like servers without X11 it's nice to have a fast
>> >> console.
>> >
>> > Fast console howto:
>> > - shadow buffer in cached memory
>> > - timer based upload of changed areas to the real framebuffer
>> >
>> > This one is actually fast, instead of trying to use hw bltcopy and having
>> > the most terrible fallback path if that's gone. Yes drm fbdev helpers has
>> > this (but not enabled on most drivers because very, very few people care).
>> 
>> Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example:
>> 
>> Lets say on average the half of every line is filled with text.
>> 
>> So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4
>> bytes = 16,588800 bytes per timer interrupt. In another Mail updating on
>> vsync was mentioned, so multiply that by 60 and get ~927MB. And even if
>> you only update the screen ony 4 times per second, that would be ~64MB
>> of data. I'm likely missing something here.
>
> Since you say 4k it's a modern box, so you have on the order of 10GB/s of
> write bandwidth.
>
> And around 100MB/s of read bandwidth. Both from the cpu. It all adds up.
> It's that uncached read which kills you and means dmesg takes seconds to
> display.
>
> Also since this is 4k looking at sales volume we're talking integrated, so
> whether it's the gpu or the cpu that's doing the memcpy, it's the same
> memory bw budget you're burning down.

That might be true for integrated graphics, as said, i don't know the
architecture. But saying it's good just because it's good on one
architecture doesn't mean it's good for everyone. If you have an
external GPU, than the memory/system bus BW would be different whether
it's memcpy or the GPU doing the scrolling. And whether internal or external
graphics - the CPU could do other stuff while the GPU scrolls stuff.

Quite a lot of discussion for a revert of a patch that was already in
the kernel for more than 20(?) years.

/Sven