On Tue, 17 May 2022 at 21:45, Adrian Larumbe <adrian.larumbe@xxxxxxxxxxxxx> wrote: > > This patch is a second attempt at eliminating the old shmem memory region > and GEM object backend, in favour of a TTM-based one that is able to manage > objects placed on both system and local memory. > > Questions addressed since previous revision: > > * Creating an anonymous vfs mount for shmem files in TTM > * Fixing LLC caching properties and bit 17 swizzling before setting a TTM > bo's pages when calling get_pages > * Added handling of phys backend from TTM functions > * Added pread callback to TTM gem object backend > * In shmem_create_from_object, ensuring an shmem object we just got a filp > for has its pages marked dirty and accessed. Otherwise, the engine won't be > able to read the initial state and a GPU hung will ensue > > However, one of the issues persists: > > Many GPU hungs in machines of GEN <= 5. My assumption is this has something > to do with a caching pitfall, but everywhere across the TTM backend code > I've tried to handle object creation and getting its pages with the same > set of caching and coherency properties as in the old shmem backend. Some thoughts in case it's helpful: - We still look to be trampling the cache_level etc after object creation. AFAICT i915_ttm_adjust_gem_after_move can be called in various places after creation. - The i915_ttm_pwrite hook won't play nice on non-llc platforms, since it doesn't force a clflush or keep track of the writes with cache_dirty. The existing ->shmem_pwrite hook only works because we are guaranteed to have not yet populated the mm.pages, and on non-llc platforms we always force a clflush in __set_pages(). In i915_ttm_pwrite we are now just calling pin_pages() and then writing through the page-cache without forcing a clflush, or ensuring that we leave cache_dirty set. Also AFAIK the whole point of shmem_pwrite was to avoid needing to populate the entire object like when calling pin_pages(). Would it make sense to just fallback to using i915_gem_shmem_pwrite, which should already take care of the required flushing? For reference a common usage pattern is something like: bb = gem_create() <-- assume non-llc so must be CACHE_NONE gem_write(bb, BATCH_BUFFER_END) <-- might use cached pwrite internally execbuf(bb) <-- doesn't see BATCH_BUFFER_END if we don't clflush > > Adrian Larumbe (1): > drm/i915: Replace shmem memory region and object backend with TTM > > drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 12 +- > drivers/gpu/drm/i915/gem/i915_gem_mman.c | 32 +- > drivers/gpu/drm/i915/gem/i915_gem_object.h | 4 +- > drivers/gpu/drm/i915/gem/i915_gem_phys.c | 32 +- > drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 390 +------------------ > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 267 ++++++++++++- > drivers/gpu/drm/i915/gem/i915_gem_ttm.h | 3 + > drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 9 +- > drivers/gpu/drm/i915/gt/shmem_utils.c | 64 ++- > drivers/gpu/drm/i915/intel_memory_region.c | 7 +- > 10 files changed, 398 insertions(+), 422 deletions(-) > > -- > 2.35.1 >