On Mon, 4 Nov 2024 12:49:56 +0000 Akash Goel <akash.goel@xxxxxxx> wrote: > On 11/4/24 11:16, Boris Brezillon wrote: > > Hi Akash, > > > > On Thu, 31 Oct 2024 21:42:27 +0000 > > Akash Goel <akash.goel@xxxxxxx> wrote: > > > >> I assume you also reckon that there is a potential problem here for arm64. > > > > It impacts any system that's not IO-coherent I would say, and this > > comment seems to prove this is a known issue [3]. > > > > Thanks for confirming. > > Actually I had tried to check with Daniel Vetter about [3], as it was > not clear to me that how that code exactly helped in x86 case. > As far as I understand, [3] updates the attribute of direct kernel > mapping of the shmem pages to WC, so as to be consistent with the > Userspace mapping of the pages or their vmapping inside the kernel. > But didn't get how that alignment actually helped in cleaning the dirty > cache lines. Yeah, I was not referring to the code but rather the fact that x86, with its IO coherency model, is a special case here, and that other archs probably need explicit flushes in a few places. > >> > >> shmem calls 'flush_dcache_folio()' after clearing the pages but that > >> just clears the 'PG_dcache_clean' bit and CPU cache is not cleaned > >> immediately. > >> > >> I realize that this patch is not foolproof, as Userspace can try to > >> populate the BO from CPU side before mapping it on the GPU side. > >> > >> Not sure if we also need to consider the case when shmem pages are > >> swapped out. Don't know if there could be a similar situation of dirty > >> cachelines after the swap in. > > > > I think we do. We basically need to flush CPU caches any time > > pages are [re]allocated, because the shmem layer will either zero-out > > (first allocation) or populate (swap-in) in that path, and in both > > cases, it involves a CPU copy to a cached mapping. > > > > Thanks for confirming. > > I think we may have to do cache flush page by page. > Not all pages might get swapped out and the initial allocation of all > pages may not happen at the same time. If the pages are mapped GPU-side, it's always all pages at a time (at least until we add support for lazy page allocation, AKA growing/heap buffers). You're right that GPU buffers that have only been mapped CPU-side with mmap() get their pages lazily allocated, though I'm not really sure we care about optimizing that case just yet. > Please correct me if my understanding is wrong. Eviction should be rare enough that we can probably pay the price of a flush on the entire BO range.