On Tue, 2012-05-22 at 22:54 +0300, Siarhei Siamashka wrote: > This is a very simple few-liner patchset, which allows to optionally > enable write-through caching for OMAP DSS framebuffer. The problem with > the current writecombine cacheability attribute is that it only speeds > up writes. Uncached reads are slow, even though the use of NEON mitigates > this problem a bit. > > Traditionally, xf86-video-fbdev DDX is using shadow framebuffer in the > system memory. Which contains a copy of the framebuffer data for the > purpose of providing fast read access to it when needed. Framebuffer > read access is required not so often, but it still gets used for > scrolling and moving windows around in Xorg server. And the users > perceive their linux desktop as rather sluggish when these operations > are not fast enough. > > In the case of ARM hardware, framebuffer is typically physically > located in the main memory. And the processors still support > write-through cacheability attribute. According to ARM ARM, the writes > done to write-through cached memory inside the level of cache are > visible to all observers outside the level of cache without the need > of explicit cache maintenance (same rule as for non-cached memory). > So write-through cache is a perfect choice when only CPU is allowed > to modify the data in the framebuffer and everyone else (screen > refresh DMA) is only reading it. That is, assuming that write-through > cached memory provides good performance and there are no quirks. > As the framebuffer reads become fast, the need for shadow framebuffer > disappears. I ran my own fb perf test on omap3 overo board ("perf" test in https://gitorious.org/linux-omap-dss2/omapfb-tests) : vram_cache=n: sequential_horiz_singlepixel_read: 25198080 pix, 4955475 us, 5084897 pix/s sequential_horiz_singlepixel_write: 434634240 pix, 4081146 us, 106498086 pix/s sequential_vert_singlepixel_read: 20106240 pix, 4970611 us, 4045023 pix/s sequential_vert_singlepixel_write: 98572800 pix, 4985748 us, 19770915 pix/s sequential_line_read: 40734720 pix, 4977906 us, 8183103 pix/s sequential_line_write: 1058580480 pix, 5024628 us, 210678378 pix/s nonsequential_singlepixel_write: 17625600 pix, 4992828 us, 3530183 pix/s nonsequential_singlepixel_read: 9661440 pix, 4952973 us, 1950634 pix/s vram_cache=y: sequential_horiz_singlepixel_read: 270389760 pix, 4994154 us, 54141253 pix/s sequential_horiz_singlepixel_write: 473149440 pix, 3932801 us, 120308512 pix/s sequential_vert_singlepixel_read: 18147840 pix, 4976226 us, 3646908 pix/s sequential_vert_singlepixel_write: 100661760 pix, 4993164 us, 20159914 pix/s sequential_line_read: 285143040 pix, 4917267 us, 57988114 pix/s sequential_line_write: 876710400 pix, 5012146 us, 174917171 pix/s nonsequential_singlepixel_write: 17625600 pix, 4977967 us, 3540722 pix/s nonsequential_singlepixel_read: 9661440 pix, 4944885 us, 1953825 pix/s These also show quite a bit of improvement in some read cases. Interestingly some of the write cases are also faster. Reading pixels vertically is slower with vram_cache. I guess this is because the cache causes some overhead, and we always miss the cache so the caching is just wasted time. I would've also presumed the difference in sequential_line_write would be bigger. write-through is effectively no-cache for writes, right? If the user of the fb just writes to the fb and vram_cache=y, it means that the cache is filled with pixel data that is never used, thus lowering the performance of all other programs? I have to say I don't know much of the cpu caches, but the read speed improvements are very big, so I think this is definitely interesting patch. So if you get the first patch accepted I see no problem with adding this to omapfb as an optional feature. However, "vram_cache" is not a very good name for the option. "vram_writethrough", or something? Did you test this with VRFB (omap3) or TILER (omap4)? I wonder how those are affected. Tomi
Attachment:
signature.asc
Description: This is a digitally signed message part