On 01/11/2021 05:20, Bert Schiettecatte wrote: > Hi John > >> Coincidentally, I've been looking at Panfrost on RK3288 this week as >> well! I'm testing it with a project that has been using the binary blob >> driver for several years and unfortunately Panfrost seems to use ~15% >> more CPU. >> Like you, I see a huge number of minor faults (~500/second compared with >> ~3/second on libmali). It seems that Panfrost is mmap'ing and >> munmap'ing buffers on every frame which doesn't happen when the same >> application is using the binary driver. > > Thanks for confirming you are seeing the same issue. > >> Panfrost experts, is there a missed opportunity for optimisation here? >> Or is there something applications should be doing differently to avoid >> repeatedly mapping & unmapping the same buffers? > > Panfrost team - any update on this? I was hoping Alyssa would comment since she's much more familiar with Mesa than I am! On the first point of libmali not performing mmap()s very often - I'll just note that this was a specific design goal and for example the kbase kernel driver provides ioctl()s to do CPU cache maintenance for this to work on arm platforms (i.e. it's not a portable solution). So short answer: yes there is room for optimisation here. However things get tricky when fitting into a portable framework. The easiest way of ensuring cache coherency is to ensure there is a clear owner - so the usual approach is mmap(), read/write some data on the CPU, munmap(), GPU accesses data, repeat. The DMA framework in the kernel will then ensure that any cache maintenance/bounce buffering or other quirks are dealt with. Having said that we know that existing platforms don't require these 'quirks' (because libmali works on them) so in theory it should be possible for Mesa to avoid the mmap()/munmap() dance in many cases (where the memory is coherent with the GPU[1]). But this is where my knowledge of Mesa is lacking as I've no idea how to go about that. Regards, Steve [1] I think this should actually be true all the time with Panfrost as the buffer is mapped write-combining on the CPU if the GPU isn't fully coherent. But I haven't double checked this.