On Thu, Dec 05, 2024 at 03:33:46PM +0100, Nirmoy Das wrote:
On 12/5/2024 1:40 PM, Matthew Auld wrote:
On 05/12/2024 12:02, Nirmoy Das wrote:
There could be still migration job going on while doing
xe_tt_unmap_sg() which could trigger GPU page faults. Fix this by
waiting for the migration job to finish.
v2: Use intr=false(Matt A)
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3466
Fixes: 75521e8b56e8 ("drm/xe: Perform dma_map when moving system buffer objects to TT")
Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> # v6.11+
Cc: Matthew Auld <matthew.auld@xxxxxxxxx>
Signed-off-by: Nirmoy Das <nirmoy.das@xxxxxxxxx>
Ok, so this is something like ttm_bo_move_to_ghost() doing a pipeline move for tt -> system, but we then do xe_tt_unmap_sg() too early which tears down the IOMMU (if enabled) mappings whilst the job is in progress?
Yes, this exactly what is happening for this issue.
Maybe add some more info to the commit message?
I will add more details.
Are you going to send a new version? Once this is fixed, please also
send a revert MR to the kconfig workaround
3940181b1bad @ gitlab.freedesktop.org/drm/xe/ci.git
Lucas De Marchi
I think this for sure fixes it. Just wondering if it's somehow possible to keep the mapping until the job is done, since all tt -> sys moves are now synced here?
Unless Thomas has a better idea here,
Reviewed-by: Matthew Auld <matthew.auld@xxxxxxxxx>
Thanks,
Nirmoy
---
drivers/gpu/drm/xe/xe_bo.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index b2aa368a23f8..c906a5529db0 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -857,8 +857,16 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
out:
if ((!ttm_bo->resource || ttm_bo->resource->mem_type == XE_PL_SYSTEM) &&
- ttm_bo->ttm)
+ ttm_bo->ttm) {
+ long timeout = dma_resv_wait_timeout(ttm_bo->base.resv,
+ DMA_RESV_USAGE_BOOKKEEP,
+ false,
+ MAX_SCHEDULE_TIMEOUT);
+ if (timeout < 0)
+ ret = timeout;
+
xe_tt_unmap_sg(ttm_bo->ttm);
+ }
return ret;
}