On Wed, 2024-08-07 at 23:43 +0000, Matthew Brost wrote: > On Wed, Jul 03, 2024 at 05:38:11PM +0200, Thomas Hellström wrote: > > Use fault-injection to test partial TTM swapout and interrupted > > swapin. > > Return -EINTR for swapin to test the callers ability to handle and > > restart the swapin, and on swapout perform a partial swapout to > > test that > > the swapin and release_shrunken functionality. > > > > Cc: Christian König <christian.koenig@xxxxxxx> > > Cc: Somalapuram Amaranath <Amaranath.Somalapuram@xxxxxxx> > > Cc: Matthew Brost <matthew.brost@xxxxxxxxx> > > Cc: <dri-devel@xxxxxxxxxxxxxxxxxxxxx> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/Kconfig | 10 ++++++++++ > > drivers/gpu/drm/ttm/ttm_pool.c | 17 ++++++++++++++++- > > 2 files changed, 26 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig > > index fd0749c0c630..9f27271bfab8 100644 > > --- a/drivers/gpu/drm/Kconfig > > +++ b/drivers/gpu/drm/Kconfig > > @@ -272,6 +272,16 @@ config DRM_GPUVM > > GPU-VM representation providing helpers to manage a GPUs > > virtual > > address space > > > > +config DRM_TTM_BACKUP_FAULT_INJECT > > + bool "Enable fault injection during TTM backup" > > + depends on DRM_TTM > > + default n > > + help > > + Inject recoverable failures during TTM backup and > > recovery of > > + backed-up objects. For DRM driver developers only. > > + > > + If in doubt, choose N. > > + > > config DRM_BUDDY > > tristate > > depends on DRM > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c > > b/drivers/gpu/drm/ttm/ttm_pool.c > > index 38e50cf81b0a..d32a1f2e5e50 100644 > > --- a/drivers/gpu/drm/ttm/ttm_pool.c > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c > > @@ -431,6 +431,7 @@ static int ttm_pool_restore_tt(struct > > ttm_pool_tt_restore *restore, > > struct ttm_backup *backup, > > struct ttm_operation_ctx *ctx) > > { > > + static unsigned long __maybe_unused swappedin; > > unsigned int i, nr = 1 << restore->order; > > int ret = 0; > > > > @@ -446,6 +447,13 @@ static int ttm_pool_restore_tt(struct > > ttm_pool_tt_restore *restore, > > if (handle == 0) > > continue; > > > > + if > > (IS_ENABLED(CONFIG_DRM_TTM_BACKUP_FAULT_INJECT) && > > + ctx->interruptible && > > + ++swappedin % 100 == 0) { > > + ret = -EINTR; > > + break; > > + } > > So here this -EINTR would be kicked to the user IOCTL which triggered > the BO validate and retry? The restore then should be able to > successfully pick up where it left off? Yes, that's the point. For the direct swap-cache backend I initially used (before concluding that the shmem one actually seemed to work fine), we had an interruptible wait here. Supporting interrupts is generally a good thing but for the pool code, this makes the code considerably more complicated. However, this is a good way to ensure drivers actually support -EINTR for the call chain. If not, adding interrupt capability "later" will most likely be a PITA. > > > + > > ret = backup->ops->copy_backed_up_page > > (backup, restore->first_page[i], > > handle, ctx->interruptible); > > @@ -892,7 +900,14 @@ long ttm_pool_backup_tt(struct ttm_pool *pool, > > struct ttm_tt *ttm, bool purge, > > > > alloc_gfp = GFP_KERNEL | __GFP_HIGH | __GFP_NOWARN | > > __GFP_RETRY_MAYFAIL; > > > > - for (i = 0; i < ttm->num_pages; ++i) { > > + num_pages = ttm->num_pages; > > + > > + /* Pretend doing fault injection by shrinking only half of > > the pages. */ > > + > > + if (IS_ENABLED(CONFIG_DRM_TTM_BACKUP_FAULT_INJECT)) > > + num_pages = DIV_ROUND_UP(num_pages, 2); > > So what happens here? Half the pages swapped out, then upon restore > half > swapped back in? The shrinker continues to walk until enough pages > swapped out? Yes, exactly. Ideally we'd want some intermediate state here so that a partially swapped out bo is still eligible for further shrinking. /Thomas > > Matt > > > + > > + for (i = 0; i < num_pages; ++i) { > > page = ttm->pages[i]; > > if (unlikely(!page)) > > continue; > > -- > > 2.44.0 > >