This patch series introduces failsafe migration blits. The reason for this seemingly strange concept is that if the initial clearing or readback of LMEM fails for some reason[1], and we then set up either GPU- or CPU ptes to the allocated LMEM, we can expose old contents from other clients. So after each migration blit to LMEM, attach a dma-fence callback that checks the migration fence error value and if it's an error, performs a memcpy blit, instead. Patch 1 splits out the TTM move code into separate files Patch 2 implements the failsafe blits and related self-tests [1] There are at least two ways we could trigger exposure of uninitialized LMEM assuming the migration blits themselves never trigger a gpu hang. a) A gpu operation preceding a pipelined eviction blit resets and sets the error fence to -EIO, and the error is propagated across the TTM manager to the clear / swapin blit of a newly allocated TTM resource. It aborts and leaves the memory uninitialized. b) Something wedges the GT while a migration blit is submitted. It ends up never executed and TTM can fault user-space cpu-ptes into uninitialized memory. v3: - Style fixes in second patch (Matthew Auld) v4: - More style fixes in second patch (Matthew Auld) v5: - Fix an issue where we might end up waiting for a fence that would never signal. Thomas Hellström (2): drm/i915/ttm: Reorganize the ttm move code drm/i915/ttm: Failsafe migration blits drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 328 ++--------- drivers/gpu/drm/i915/gem/i915_gem_ttm.h | 35 ++ drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 522 ++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 43 ++ .../drm/i915/gem/selftests/i915_gem_migrate.c | 24 +- 6 files changed, 672 insertions(+), 281 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h -- 2.31.1