Am 03.12.24 um 18:44 schrieb Thomas Hellström:
On Tue, 2024-12-03 at 17:46 +0100, Christian König wrote:
Am 03.12.24 um 17:43 schrieb Thomas Hellström:
On Tue, 2024-12-03 at 17:39 +0100, Christian König wrote:
Am 03.12.24 um 17:31 schrieb Thomas Hellström:
On Tue, 2024-12-03 at 17:20 +0100, Christian König wrote:
[SNIP]
@@ -453,9 +601,36 @@ int ttm_pool_alloc(struct
ttm_pool
*pool,
struct ttm_tt *tt,
else
gfp_flags |= GFP_HIGHUSER;
- for (order = min_t(unsigned int,
MAX_PAGE_ORDER,
__fls(num_pages));
- num_pages;
- order = min_t(unsigned int, order,
__fls(num_pages)))
{
+ order = min_t(unsigned int,
MAX_PAGE_ORDER,
__fls(num_pages));
+
+ if (tt->page_flags &
TTM_TT_FLAG_PRIV_BACKED_UP) {
+ if (!tt->restore) {
+ gfp_t gfp = GFP_KERNEL |
__GFP_NOWARN;
+
+ if (ctx-
gfp_retry_mayfail)
+ gfp |=
__GFP_RETRY_MAYFAIL;
+
+ tt->restore =
+ kvzalloc(struct_si
ze(t
t-
restore,
old_pages,
+
(size_t)1
<<
order), gfp);
+ if (!tt->restore)
+ return -ENOMEM;
+ } else if
(ttm_pool_restore_valid(tt-
restore)) {
+ struct ttm_pool_tt_restore
*restore =
tt-
restore;
+
+ num_pages -= restore-
alloced_pages;
+ order = min_t(unsigned
int,
order,
__fls(num_pages));
+ pages += restore-
alloced_pages;
+ r =
ttm_pool_restore_tt(restore,
tt-
backup, ctx);
+ if (r)
+ return r;
+ caching = restore-
caching_divide;
+ }
+
+ tt->restore->pool = pool;
+ }
Hui? Why is that part of the allocation function now?
At bare minimum I would expect that this is a new
function.
It's because we now have partially backed up tts, so
the
restore is
interleaved on a per-page basis, replacing the backup
handles
with
page-pointers. I'll see if I can separate out at least
the
initialization here.
Yeah, that kind of makes sense.
My expectation was just that we now have explicit
ttm_pool_swapout()
and
ttm_pool_swapin() functions.
I fully understand, although in the allocation step, that
would
also
increase the memory pressure since we might momentarily
have
twice
the
bo-size allocated, if the shmem object was never swapped
out,
and
we
don't want to unnecessarily risc OOM at recover time,
although
that
should be a recoverable situation now. If the OOM receiver
can
free
up
system memory resources they can could potentially restart
the
recover.
What I meant was more that we have ttm_pool_swapout() which
does
a
mix
of moving each page to a swap backend and freeing one by one.
And ttm_pool_swapin() which allocates a bit of memory
(usually
one
huge
page) and then copies the content back in from the swap
backend.
Alternatively we could rename ttm_pool_alloc() into something
like
ttm_pool_populate() and ttm_pool_free() into
ttm_pool_unpopulate(),
but
those names are not very descriptive either.
It's just that we now do a bit more than just alloc and free
in
those
functions, so the naming doesn't really match that well any
more.
So what about ttm_pool_alloc() and ttm_pool_recover/swapin(),
both
pointing to the same code, but _alloc() asserts that the tt
isn't
backed up?
That would give a clean interface at least.
More or less ok. I would just put figuring out the gfp flags and
the
stuff inside the for (order... loop into separate functions. And
then
remove the if (tt->page_flags & TTM_TT_FLAG_PRIV_BACKED_UP) from
the
pool.
In other words you trigger the back restore by calling a
different
function than the allocation one.
I'll take a look at this as well.
Ah, and BTW: It's perfectly possible that ttm_tt_free() is called
because a halve swapped TT is about to be destroyed!
If I'm not completely mistaken that is not handled gracefully when we
try to always backup from in the ttm_tt_free() function.
So we clearly need the separation of move this TT to a backup (and
eventually only partially) and freeing it.
Hm. I'm not sure I follow completely.
The ttm_pool interface is currently:
ttm_pool_alloc() -> allocs and may recover from backup. May leave
partially backed up. Called from ttm_tt_populate() or its driver
callbacks.
Yeah that this is done by a single function looks really strange to me.
ttm_pool_backup_tt() -> Attempts to back up (the not already backed up
part of a tt. Called from ttm_tt_backup(), which is just a tt layer
wrapper. If called with purge==true, then frees memory bypassing the
pool to return it to the system directly.
ttm_pool_free() -> Frees a (potentially backed up or partially backed
up) tt. Called from ttm_tt_unpopulate() or its driver callbacks.
Ah! I missed that you have separated that functionality from the free path.
I've only saw the allocation path and though I need to clear that up first.
So the backup functionality is implemented with a minimal change to
upper layers, and I don't think there is a correctness problem on
free().
So could you clarify a bit if it is this interface you think needs
changing or that the implementation is better at separating out the
backup functionality from the pool functionality?
I think we should just make the ttm pool object take charge of
allocation, backup, restore and free operation on the TT objects.
And all of those are separate operations, they just internally share
steps to archive what they should do.
BTW I really dislike that tt->restore is allocated dynamically. That is
just another allocation which can cause problems.
We should probably have all the state necessary for the operation in the
TT object.
Regards,
Christian.
Thanks,
Thomas
Christian.
/Thomas
For a renaming change that touch all TTM drivers, I'd rather
put
that
as a last patch since getting acks for that from all TTM driver
maintainers seems like a hopeless undertaking.
Yeah the acks are not the problem, merging it through the xe tree
would be.
Christian.
/Thomas
Christian.
/Thomas