Am 04.12.24 um 10:56 schrieb Thomas Hellström:
On Wed, 2024-12-04 at 10:16 +0100, Christian König wrote:
Am 03.12.24 um 18:44 schrieb Thomas Hellström:
On Tue, 2024-12-03 at 17:46 +0100, Christian König wrote:
Am 03.12.24 um 17:43 schrieb Thomas Hellström:
On Tue, 2024-12-03 at 17:39 +0100, Christian König wrote:
Am 03.12.24 um 17:31 schrieb Thomas Hellström:
On Tue, 2024-12-03 at 17:20 +0100, Christian König wrote:
[SNIP]
@@ -453,9 +601,36 @@ int ttm_pool_alloc(struct
ttm_pool
*pool,
struct ttm_tt *tt,
else
gfp_flags |= GFP_HIGHUSER;
- for (order = min_t(unsigned int,
MAX_PAGE_ORDER,
__fls(num_pages));
- num_pages;
- order = min_t(unsigned int,
order,
__fls(num_pages)))
{
+ order = min_t(unsigned int,
MAX_PAGE_ORDER,
__fls(num_pages));
+
+ if (tt->page_flags &
TTM_TT_FLAG_PRIV_BACKED_UP) {
+ if (!tt->restore) {
+ gfp_t gfp = GFP_KERNEL
|
__GFP_NOWARN;
+
+ if (ctx-
gfp_retry_mayfail)
+ gfp |=
__GFP_RETRY_MAYFAIL;
+
+ tt->restore =
+ kvzalloc(struc
t_si
ze(t
t-
restore,
old_pages,
+
(size_t)1
<<
order), gfp);
+ if (!tt->restore)
+ return -
ENOMEM;
+ } else if
(ttm_pool_restore_valid(tt-
restore)) {
+ struct
ttm_pool_tt_restore
*restore =
tt-
restore;
+
+ num_pages -= restore-
alloced_pages;
+ order = min_t(unsigned
int,
order,
__fls(num_pages));
+ pages += restore-
alloced_pages;
+ r =
ttm_pool_restore_tt(restore,
tt-
backup, ctx);
+ if (r)
+ return r;
+ caching = restore-
caching_divide;
+ }
+
+ tt->restore->pool = pool;
+ }
Hui? Why is that part of the allocation function
now?
At bare minimum I would expect that this is a new
function.
It's because we now have partially backed up tts,
so
the
restore is
interleaved on a per-page basis, replacing the
backup
handles
with
page-pointers. I'll see if I can separate out at
least
the
initialization here.
Yeah, that kind of makes sense.
My expectation was just that we now have explicit
ttm_pool_swapout()
and
ttm_pool_swapin() functions.
I fully understand, although in the allocation step,
that
would
also
increase the memory pressure since we might momentarily
have
twice
the
bo-size allocated, if the shmem object was never
swapped
out,
and
we
don't want to unnecessarily risc OOM at recover time,
although
that
should be a recoverable situation now. If the OOM
receiver
can
free
up
system memory resources they can could potentially
restart
the
recover.
What I meant was more that we have ttm_pool_swapout()
which
does
a
mix
of moving each page to a swap backend and freeing one by
one.
And ttm_pool_swapin() which allocates a bit of memory
(usually
one
huge
page) and then copies the content back in from the swap
backend.
Alternatively we could rename ttm_pool_alloc() into
something
like
ttm_pool_populate() and ttm_pool_free() into
ttm_pool_unpopulate(),
but
those names are not very descriptive either.
It's just that we now do a bit more than just alloc and
free
in
those
functions, so the naming doesn't really match that well
any
more.
So what about ttm_pool_alloc() and
ttm_pool_recover/swapin(),
both
pointing to the same code, but _alloc() asserts that the tt
isn't
backed up?
That would give a clean interface at least.
More or less ok. I would just put figuring out the gfp flags
and
the
stuff inside the for (order... loop into separate functions.
And
then
remove the if (tt->page_flags & TTM_TT_FLAG_PRIV_BACKED_UP)
from
the
pool.
In other words you trigger the back restore by calling a
different
function than the allocation one.
I'll take a look at this as well.
Ah, and BTW: It's perfectly possible that ttm_tt_free() is called
because a halve swapped TT is about to be destroyed!
If I'm not completely mistaken that is not handled gracefully
when we
try to always backup from in the ttm_tt_free() function.
So we clearly need the separation of move this TT to a backup
(and
eventually only partially) and freeing it.
Hm. I'm not sure I follow completely.
The ttm_pool interface is currently:
ttm_pool_alloc() -> allocs and may recover from backup. May leave
partially backed up. Called from ttm_tt_populate() or its driver
callbacks.
Yeah that this is done by a single function looks really strange to
me.
ttm_pool_backup_tt() -> Attempts to back up (the not already backed
up
part of a tt. Called from ttm_tt_backup(), which is just a tt layer
wrapper. If called with purge==true, then frees memory bypassing
the
pool to return it to the system directly.
ttm_pool_free() -> Frees a (potentially backed up or partially
backed
up) tt. Called from ttm_tt_unpopulate() or its driver callbacks.
Ah! I missed that you have separated that functionality from the free
path.
I've only saw the allocation path and though I need to clear that up
first.
So the backup functionality is implemented with a minimal change to
upper layers, and I don't think there is a correctness problem on
free().
So could you clarify a bit if it is this interface you think needs
changing or that the implementation is better at separating out the
backup functionality from the pool functionality?
I think we should just make the ttm pool object take charge of
allocation, backup, restore and free operation on the TT objects.
And all of those are separate operations, they just internally share
steps to archive what they should do.
So are we looking at an interface change like:
ttm_pool_alloc() // no recover. Errors if backed-up-data present.
ttm_pool_alloc_and_recover() // because you can't alloc first and then
recover in a memory-efficient manner, since you need to interleave.
ttm_pool_backup() // as currently
ttm_pool_drop_backed_up() //drops the backed-up data if any.
ttm_pool_free() // frees all data. errors if backed-up-data present.
Is this what you mean?
Yes, exactly that.
BTW I really dislike that tt->restore is allocated dynamically. That
is
just another allocation which can cause problems.
We should probably have all the state necessary for the operation in
the
TT object.
Initially it was done this way. But that meant a pre-allocated struct
page-pointer array the of 1 << MAX_PAGE_ORDER size (2MiB) for each
ttm_tt. That lead to a patch to reduce the MAX_PAGE_ORDER to PMD size
order, but as you might remember, that needed to be ripped out because
the PMD size macros aren't constant across all architectures. IIRC it
was ARM causing compilation failures, and Linus wasn't happy.
Yeah, I do remember that. But I don't fully get why you need this
page-pointer array in the first place?
So, enter the dynamic allocation which is temporary, and 1/512 of the
size of the memory we need to allocate for the buffer object. IIRC that
was discussed with Matt when he reiewed and we concluded that it should
be ok. I think this approach leads to less memory pressure than if we'd
keep that array around all the time for *all* the allocated bos, and
the allocation is never during reclaim.
Hui? How do you avoid having to allocate that during reclaim?
I absolutely don't see that on the code currently.
Regards,
Christian.
Thanks,
Thomas
Regards,
Christian.
Thanks,
Thomas
Christian.
/Thomas
For a renaming change that touch all TTM drivers, I'd
rather
put
that
as a last patch since getting acks for that from all TTM
driver
maintainers seems like a hopeless undertaking.
Yeah the acks are not the problem, merging it through the xe
tree
would be.
Christian.
/Thomas
Christian.
/Thomas