On 11/5/24 12:42 AM, David Hildenbrand wrote:
On 05.11.24 04:29, John Hubbard wrote:
...
Yeah, I was only adding it because I stumbled over it. It might not be a problem, because we simply "skip" if we find a folio that was already isolated (possibly by us). What might happen is that we unnecessarily drain the LRU.
__collapse_huge_page_isolate() scans the compound_pagelist() list, before try-locking and isolating. But it also just "fails" instead of retrying forever.
Imagine the page tables looking like the following (e.g., COW in a MAP_PRIVATE file mapping that supports large folios)
------ F0P2 was replaced by a new (small) folio
|
[ F0P0 ] [ F0P1 ] [ F1P0 ] [F0P3 ]
F0P0: Folio 0, page 0
Assume we try pinning that range and end up in collect_longterm_unpinnable_folios() with:
F0, F0, F1, F0
Assume F0 and F1 are not long-term pinnable.
i = 0: We isolate F0
i = 1: We see that it is the same F0 and skip
i = 2: We isolate F1
i = 3: We see !folio_test_lru() and do a lru_add_drain_all() to then
fail folio_isolate_lru()
So the drain in i=3 could be avoided by scanning the list, if we already isolated that one. Working better than I originally thought.
Thanks for spelling out that case, I was having trouble visualizing it,
but now it's clear.
OK, so looking at this, I think it could be extended to more than just
"skip the drain". It seems like we should also avoid counting the folio
(the existing code seems wrong).
So I think this approach would be correct, does it seem accurate to
you as well? Here:
diff --git a/mm/gup.c b/mm/gup.c
index ad0c8922dac3..ab8e706b52f0 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2324,11 +2324,21 @@ static unsigned long collect_longterm_unpinnable_folios(
for (i = 0; i < pofs->nr_entries; i++) {
struct folio *folio = pofs_get_folio(pofs, i);
+ struct folio *tmp_folio;
+ /*
+ * Two checks to see if this folio has already been collected.
+ * The first check is quick, and the second check is thorough.
+ */
if (folio == prev_folio)
continue;
prev_folio = folio;
+ list_for_each_entry(tmp_folio, movable_folio_list, lru) {
+ if (folio == tmp_folio)
+ continue;
+ }
+
if (folio_is_longterm_pinnable(folio))
continue;
I need to test this more thoroughly, though, with a directed gup test (I'm not sure we
have one yet).
thanks,
--
John Hubbard