On Wed, Jul 03, 2024 at 05:38:11PM +0200, Thomas Hellström wrote: > Use fault-injection to test partial TTM swapout and interrupted swapin. > Return -EINTR for swapin to test the callers ability to handle and > restart the swapin, and on swapout perform a partial swapout to test that > the swapin and release_shrunken functionality. > > Cc: Christian König <christian.koenig@xxxxxxx> > Cc: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx> > Cc: Matthew Brost <matthew.brost@xxxxxxxxx> > Cc: <dri-devel@xxxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/Kconfig | 10 ++++++++++ > drivers/gpu/drm/ttm/ttm_pool.c | 17 ++++++++++++++++- > 2 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig > index fd0749c0c630..9f27271bfab8 100644 > --- a/drivers/gpu/drm/Kconfig > +++ b/drivers/gpu/drm/Kconfig > @@ -272,6 +272,16 @@ config DRM_GPUVM > GPU-VM representation providing helpers to manage a GPUs virtual > address space > > +config DRM_TTM_BACKUP_FAULT_INJECT > + bool "Enable fault injection during TTM backup" > + depends on DRM_TTM > + default n > + help > + Inject recoverable failures during TTM backup and recovery of > + backed-up objects. For DRM driver developers only. > + > + If in doubt, choose N. > + > config DRM_BUDDY > tristate > depends on DRM > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c > index 38e50cf81b0a..d32a1f2e5e50 100644 > --- a/drivers/gpu/drm/ttm/ttm_pool.c > +++ b/drivers/gpu/drm/ttm/ttm_pool.c > @@ -431,6 +431,7 @@ static int ttm_pool_restore_tt(struct ttm_pool_tt_restore *restore, > struct ttm_backup *backup, > struct ttm_operation_ctx *ctx) > { > + static unsigned long __maybe_unused swappedin; > unsigned int i, nr = 1 << restore->order; > int ret = 0; > > @@ -446,6 +447,13 @@ static int ttm_pool_restore_tt(struct ttm_pool_tt_restore *restore, > if (handle == 0) > continue; > > + if (IS_ENABLED(CONFIG_DRM_TTM_BACKUP_FAULT_INJECT) && > + ctx->interruptible && > + ++swappedin % 100 == 0) { > + ret = -EINTR; > + break; > + } So here this -EINTR would be kicked to the user IOCTL which triggered the BO validate and retry? The restore then should be able to successfully pick up where it left off? > + > ret = backup->ops->copy_backed_up_page > (backup, restore->first_page[i], > handle, ctx->interruptible); > @@ -892,7 +900,14 @@ long ttm_pool_backup_tt(struct ttm_pool *pool, struct ttm_tt *ttm, bool purge, > > alloc_gfp = GFP_KERNEL | __GFP_HIGH | __GFP_NOWARN | __GFP_RETRY_MAYFAIL; > > - for (i = 0; i < ttm->num_pages; ++i) { > + num_pages = ttm->num_pages; > + > + /* Pretend doing fault injection by shrinking only half of the pages. */ > + > + if (IS_ENABLED(CONFIG_DRM_TTM_BACKUP_FAULT_INJECT)) > + num_pages = DIV_ROUND_UP(num_pages, 2); So what happens here? Half the pages swapped out, then upon restore half swapped back in? The shrinker continues to walk until enough pages swapped out? Matt > + > + for (i = 0; i < num_pages; ++i) { > page = ttm->pages[i]; > if (unlikely(!page)) > continue; > -- > 2.44.0 >