Hi Daniel, Daniel Vetter <daniel@xxxxxxxx> writes: > On Wed, Jan 19, 2022 at 05:15:44PM +0100, Sven Schnelle wrote: >> Hi Daniel, >> >> Daniel Vetter <daniel@xxxxxxxx> writes: >> >> > On Thu, Jan 13, 2022 at 10:46:03PM +0100, Sven Schnelle wrote: >> >> Helge Deller <deller@xxxxxx> writes: >> >> > Maybe on fast new x86 boxes the performance difference isn't huge, >> >> > but for all old systems, or when emulated in qemu, this makes >> >> > a big difference. >> >> > >> >> > Helge >> >> >> >> I second that. For most people, the framebuffer isn't important as >> >> they're mostly interested in getting to X11/wayland as fast as possible. >> >> But for systems like servers without X11 it's nice to have a fast >> >> console. >> > >> > Fast console howto: >> > - shadow buffer in cached memory >> > - timer based upload of changed areas to the real framebuffer >> > >> > This one is actually fast, instead of trying to use hw bltcopy and having >> > the most terrible fallback path if that's gone. Yes drm fbdev helpers has >> > this (but not enabled on most drivers because very, very few people care). >> >> Hmm.... Take my Laptop with a 4k (3180x2160) screen as an example: >> >> Lets say on average the half of every line is filled with text. >> >> So 3840/2*2160 pixels that change = 4147200 pixels. Every pixel takes 4 >> bytes = 16,588800 bytes per timer interrupt. In another Mail updating on >> vsync was mentioned, so multiply that by 60 and get ~927MB. And even if >> you only update the screen ony 4 times per second, that would be ~64MB >> of data. I'm likely missing something here. > > Since you say 4k it's a modern box, so you have on the order of 10GB/s of > write bandwidth. > > And around 100MB/s of read bandwidth. Both from the cpu. It all adds up. > It's that uncached read which kills you and means dmesg takes seconds to > display. > > Also since this is 4k looking at sales volume we're talking integrated, so > whether it's the gpu or the cpu that's doing the memcpy, it's the same > memory bw budget you're burning down. That might be true for integrated graphics, as said, i don't know the architecture. But saying it's good just because it's good on one architecture doesn't mean it's good for everyone. If you have an external GPU, than the memory/system bus BW would be different whether it's memcpy or the GPU doing the scrolling. And whether internal or external graphics - the CPU could do other stuff while the GPU scrolls stuff. Quite a lot of discussion for a revert of a patch that was already in the kernel for more than 20(?) years. /Sven