On 11/2/21 18:40, Matthew Auld wrote:
On Tue, 2 Nov 2021 at 16:39, Thomas Hellström
<thomas.hellstrom@xxxxxxxxxxxxxxx> wrote:
If the initial fill blit or copy blit of an object fails, the old
content of the data might be exposed and read as soon as either CPU- or
GPU PTEs are set up to point at the pages.
Intercept the blit fence with an async callback that checks the
blit fence for errors and if there are errors performs an async cpu blit
instead. If there is a failure to allocate the async dma_fence_work,
allocate it on the stack and sync wait for the blit to complete.
Add selftests that simulate gpu blit failures and failure to allocate
the async dma_fence_work.
A previous version of this pach used dma_fence_work, now that's
opencoded which adds more code but might lower the latency
somewhat in the common non-error case.
v3:
- Style fixes (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
Reviewed-by: Matthew Auld <matthew.auld@xxxxxxxxx>
---
drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 322 +++++++++++++++---
drivers/gpu/drm/i915/gem/i915_gem_ttm_move.h | 5 +
.../drm/i915/gem/selftests/i915_gem_migrate.c | 24 +-
3 files changed, 295 insertions(+), 56 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 0ed6b7f2b95f..b89672c547f8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -18,6 +18,29 @@
#include "gt/intel_gt.h"
#include "gt/intel_migrate.h"
+/**
+ * DOC: Selftest failure modes for failsafe migration:
+ *
+ * For fail_gpu_migration, the gpu blit scheduled is always a clear blit
+ * rather than a copy blit, and then we force the failure paths as if
+ * the blit fence returned an error.
+ *
+ * For fail_work_allocation we fail the kmalloc of the async worker, we
+ * sync the gpu blit. If it then fails, or fail_gpu_migration is set to
+ * true, then a memcpy operation is performed sync.
+ */
+#ifdef CONFIG_DRM_I915_SELFTEST
When pushing maybe make this:
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
Which seems to be consistent with most of the other places.
Hmm,
I noticed that i915 is doing that, although I thought these macros were
primarily intended for C expressions?
/Thomas
+static bool fail_gpu_migration;
+static bool fail_work_allocation;
+
+void i915_ttm_migrate_set_failure_modes(bool gpu_migration,
+ bool work_allocation)
+{
+ fail_gpu_migration = gpu_migration;
+ fail_work_allocation = work_allocation;
+}
+#endif
+