On Tue, 2024-12-03 at 17:46 +0100, Christian König wrote: > Am 03.12.24 um 17:43 schrieb Thomas Hellström: > > On Tue, 2024-12-03 at 17:39 +0100, Christian König wrote: > > > Am 03.12.24 um 17:31 schrieb Thomas Hellström: > > > > On Tue, 2024-12-03 at 17:20 +0100, Christian König wrote: > > > > > [SNIP] > > > > > > > > > > @@ -453,9 +601,36 @@ int ttm_pool_alloc(struct > > > > > > > > > > ttm_pool > > > > > > > > > > *pool, > > > > > > > > > > struct ttm_tt *tt, > > > > > > > > > > else > > > > > > > > > > gfp_flags |= GFP_HIGHUSER; > > > > > > > > > > > > > > > > > > > > - for (order = min_t(unsigned int, > > > > > > > > > > MAX_PAGE_ORDER, > > > > > > > > > > __fls(num_pages)); > > > > > > > > > > - num_pages; > > > > > > > > > > - order = min_t(unsigned int, order, > > > > > > > > > > __fls(num_pages))) > > > > > > > > > > { > > > > > > > > > > + order = min_t(unsigned int, > > > > > > > > > > MAX_PAGE_ORDER, > > > > > > > > > > __fls(num_pages)); > > > > > > > > > > + > > > > > > > > > > + if (tt->page_flags & > > > > > > > > > > TTM_TT_FLAG_PRIV_BACKED_UP) { > > > > > > > > > > + if (!tt->restore) { > > > > > > > > > > + gfp_t gfp = GFP_KERNEL | > > > > > > > > > > __GFP_NOWARN; > > > > > > > > > > + > > > > > > > > > > + if (ctx- > > > > > > > > > > >gfp_retry_mayfail) > > > > > > > > > > + gfp |= > > > > > > > > > > __GFP_RETRY_MAYFAIL; > > > > > > > > > > + > > > > > > > > > > + tt->restore = > > > > > > > > > > + kvzalloc(struct_si > > > > > > > > > > ze(t > > > > > > > > > > t- > > > > > > > > > > > restore, > > > > > > > > > > old_pages, > > > > > > > > > > + > > > > > > > > > > (size_t)1 > > > > > > > > > > << > > > > > > > > > > order), gfp); > > > > > > > > > > + if (!tt->restore) > > > > > > > > > > + return -ENOMEM; > > > > > > > > > > + } else if > > > > > > > > > > (ttm_pool_restore_valid(tt- > > > > > > > > > > > restore)) { > > > > > > > > > > + struct ttm_pool_tt_restore > > > > > > > > > > *restore = > > > > > > > > > > tt- > > > > > > > > > > > restore; > > > > > > > > > > + > > > > > > > > > > + num_pages -= restore- > > > > > > > > > > > alloced_pages; > > > > > > > > > > + order = min_t(unsigned > > > > > > > > > > int, > > > > > > > > > > order, > > > > > > > > > > __fls(num_pages)); > > > > > > > > > > + pages += restore- > > > > > > > > > > > alloced_pages; > > > > > > > > > > + r = > > > > > > > > > > ttm_pool_restore_tt(restore, > > > > > > > > > > tt- > > > > > > > > > > > backup, ctx); > > > > > > > > > > + if (r) > > > > > > > > > > + return r; > > > > > > > > > > + caching = restore- > > > > > > > > > > > caching_divide; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + tt->restore->pool = pool; > > > > > > > > > > + } > > > > > > > > > Hui? Why is that part of the allocation function now? > > > > > > > > > > > > > > > > > > At bare minimum I would expect that this is a new > > > > > > > > > function. > > > > > > > > It's because we now have partially backed up tts, so > > > > > > > > the > > > > > > > > restore is > > > > > > > > interleaved on a per-page basis, replacing the backup > > > > > > > > handles > > > > > > > > with > > > > > > > > page-pointers. I'll see if I can separate out at least > > > > > > > > the > > > > > > > > initialization here. > > > > > > > Yeah, that kind of makes sense. > > > > > > > > > > > > > > My expectation was just that we now have explicit > > > > > > > ttm_pool_swapout() > > > > > > > and > > > > > > > ttm_pool_swapin() functions. > > > > > > I fully understand, although in the allocation step, that > > > > > > would > > > > > > also > > > > > > increase the memory pressure since we might momentarily > > > > > > have > > > > > > twice > > > > > > the > > > > > > bo-size allocated, if the shmem object was never swapped > > > > > > out, > > > > > > and > > > > > > we > > > > > > don't want to unnecessarily risc OOM at recover time, > > > > > > although > > > > > > that > > > > > > should be a recoverable situation now. If the OOM receiver > > > > > > can > > > > > > free > > > > > > up > > > > > > system memory resources they can could potentially restart > > > > > > the > > > > > > recover. > > > > > What I meant was more that we have ttm_pool_swapout() which > > > > > does > > > > > a > > > > > mix > > > > > of moving each page to a swap backend and freeing one by one. > > > > > > > > > > And ttm_pool_swapin() which allocates a bit of memory > > > > > (usually > > > > > one > > > > > huge > > > > > page) and then copies the content back in from the swap > > > > > backend. > > > > > > > > > > Alternatively we could rename ttm_pool_alloc() into something > > > > > like > > > > > ttm_pool_populate() and ttm_pool_free() into > > > > > ttm_pool_unpopulate(), > > > > > but > > > > > those names are not very descriptive either. > > > > > > > > > > It's just that we now do a bit more than just alloc and free > > > > > in > > > > > those > > > > > functions, so the naming doesn't really match that well any > > > > > more. > > > > So what about ttm_pool_alloc() and ttm_pool_recover/swapin(), > > > > both > > > > pointing to the same code, but _alloc() asserts that the tt > > > > isn't > > > > backed up? > > > > > > > > That would give a clean interface at least. > > > More or less ok. I would just put figuring out the gfp flags and > > > the > > > stuff inside the for (order... loop into separate functions. And > > > then > > > remove the if (tt->page_flags & TTM_TT_FLAG_PRIV_BACKED_UP) from > > > the > > > pool. > > > > > > In other words you trigger the back restore by calling a > > > different > > > function than the allocation one. > > I'll take a look at this as well. > > Ah, and BTW: It's perfectly possible that ttm_tt_free() is called > because a halve swapped TT is about to be destroyed! > > If I'm not completely mistaken that is not handled gracefully when we > try to always backup from in the ttm_tt_free() function. > > So we clearly need the separation of move this TT to a backup (and > eventually only partially) and freeing it. Hm. I'm not sure I follow completely. The ttm_pool interface is currently: ttm_pool_alloc() -> allocs and may recover from backup. May leave partially backed up. Called from ttm_tt_populate() or its driver callbacks. ttm_pool_backup_tt() -> Attempts to back up (the not already backed up part of a tt. Called from ttm_tt_backup(), which is just a tt layer wrapper. If called with purge==true, then frees memory bypassing the pool to return it to the system directly. ttm_pool_free() -> Frees a (potentially backed up or partially backed up) tt. Called from ttm_tt_unpopulate() or its driver callbacks. So the backup functionality is implemented with a minimal change to upper layers, and I don't think there is a correctness problem on free(). So could you clarify a bit if it is this interface you think needs changing or that the implementation is better at separating out the backup functionality from the pool functionality? Thanks, Thomas > > Christian. > > > > > /Thomas > > > > > > > > For a renaming change that touch all TTM drivers, I'd rather > > > > put > > > > that > > > > as a last patch since getting acks for that from all TTM driver > > > > maintainers seems like a hopeless undertaking. > > > Yeah the acks are not the problem, merging it through the xe tree > > > would be. > > > > > > Christian. > > > > > > > > > > /Thomas > > > > > > > > > > > > > > > > > > > > > Christian. > > > > > > > > > > > /Thomas >