On Fri, 27 May 2022 at 16:37, Adrian Larumbe <adrian.larumbe@xxxxxxxxxxxxx> wrote: > > On 18.05.2022 16:00, Matthew Auld wrote: > > On Tue, 17 May 2022 at 21:45, Adrian Larumbe > > <adrian.larumbe@xxxxxxxxxxxxx> wrote: > > > > > > This patch is a second attempt at eliminating the old shmem memory region > > > and GEM object backend, in favour of a TTM-based one that is able to manage > > > objects placed on both system and local memory. > > > > > > Questions addressed since previous revision: > > > > > > * Creating an anonymous vfs mount for shmem files in TTM > > > * Fixing LLC caching properties and bit 17 swizzling before setting a TTM > > > bo's pages when calling get_pages > > > * Added handling of phys backend from TTM functions > > > * Added pread callback to TTM gem object backend > > > * In shmem_create_from_object, ensuring an shmem object we just got a filp > > > for has its pages marked dirty and accessed. Otherwise, the engine won't be > > > able to read the initial state and a GPU hung will ensue > > > > > > However, one of the issues persists: > > > > > > Many GPU hungs in machines of GEN <= 5. My assumption is this has something > > > to do with a caching pitfall, but everywhere across the TTM backend code > > > I've tried to handle object creation and getting its pages with the same > > > set of caching and coherency properties as in the old shmem backend. > > > > Some thoughts in case it's helpful: > > > > - We still look to be trampling the cache_level etc after object > > creation. AFAICT i915_ttm_adjust_gem_after_move can be called in > > various places after creation. > > I traced this function so that I could see everywhere it was being called when > running some IGT tests and kmscube, and the only place it was setting a caching > coherence value other than none was in init_status_page, where I915_CACHE_LLC is > picked as the cache coherency mode even for machines that do not have > it. However this code was already present before my changes and didn't seem to > cause any issues, so I don't think it's involved. Yeah, I guess it's a different issue, but we still need to somehow ensure we never trample the cache_level etc on integrated platforms after object creation. On non-LLC platforms(not discrete) it's normal to set CACHE_LLC for certain types of buffers. And even on LLC platforms it's normal to set CACHE_NONE, like for display buffers, since the display engine is not coherent. For reference we can have something like: bb = gem_create() <-- assume non-llc so must be CACHE_NONE gem_set_caching(bb, CACHED) <-- should now be CACHE_LLC/L3 ptr = mmap_wb(bb); *ptr = BATCH_BUFFER_END <-- gem_after_move() resets to CACHE_NONE execbuf(bb) <-- doesn't see the BATCH_BUFFER_END > > > - The i915_ttm_pwrite hook won't play nice on non-llc platforms, since > > it doesn't force a clflush or keep track of the writes with > > cache_dirty. The existing ->shmem_pwrite hook only works because we > > are guaranteed to have not yet populated the mm.pages, and on non-llc > > platforms we always force a clflush in __set_pages(). In > > i915_ttm_pwrite we are now just calling pin_pages() and then writing > > through the page-cache without forcing a clflush, or ensuring that we > > leave cache_dirty set. Also AFAIK the whole point of shmem_pwrite was > > to avoid needing to populate the entire object like when calling > > pin_pages(). Would it make sense to just fallback to using > > i915_gem_shmem_pwrite, which should already take care of the required > > flushing? > > > > For reference a common usage pattern is something like: > > > > bb = gem_create() <-- assume non-llc so must be CACHE_NONE > > gem_write(bb, BATCH_BUFFER_END) <-- might use cached pwrite internally > > execbuf(bb) <-- doesn't see BATCH_BUFFER_END if we don't clflush > > It turns out this was the underlying issue causing machines GEN <= 5 to break in > pretty much every single test. It seems that for older hardware, IGT tests would > pick pwrite as the preferred method for filling BO's from UM, so my code wasn't > calling clfush after getting the pages and writing the UM pointer data into the > shmem file. > > The way I fixed it was creating an shmem file for this and other cases when it's > required by the existing code, but without getting the pages. In a way, I just > cut through the usual TTM populate path and instance an shmem object so that I > can avoid caching issues. > > Thanks a lot for catching this one! > > > > > > > Adrian Larumbe (1): > > > drm/i915: Replace shmem memory region and object backend with TTM > > > > > > drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 12 +- > > > drivers/gpu/drm/i915/gem/i915_gem_mman.c | 32 +- > > > drivers/gpu/drm/i915/gem/i915_gem_object.h | 4 +- > > > drivers/gpu/drm/i915/gem/i915_gem_phys.c | 32 +- > > > drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 390 +------------------ > > > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 267 ++++++++++++- > > > drivers/gpu/drm/i915/gem/i915_gem_ttm.h | 3 + > > > drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 9 +- > > > drivers/gpu/drm/i915/gt/shmem_utils.c | 64 ++- > > > drivers/gpu/drm/i915/intel_memory_region.c | 7 +- > > > 10 files changed, 398 insertions(+), 422 deletions(-) > > > > > > -- > > > 2.35.1 >