On 13.03.24 04:16, Matthew Wilcox wrote:
On Tue, Mar 12, 2024 at 06:23:43PM -0700, Jane Chu wrote:
I noticed this recently
OK, this is entirely different, so I'm going to start a new thread ;-)
* GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
* they are not mapped.
*
* Returns 0 if the hugepage is split successfully.
* Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
* us.
*/
int split_huge_page_to_list(struct page *page, struct list_head *list)
{
I have a test case with poisoned shmem THP page that was mlocked and
GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
I'm going to blame John for this!
The description is wrong. Whoever calls split_huge_page_to_list() must
hold a folio reference.
That folio reference will be transferred to @page (not the head page)
once split. So @page can be used by the caller after the split succeeded.
There's no reference to pincount
anywhere in huge_memory.c, so I have no clue how this comment is even
Each pincount increment/decrement must be paired with a folio refcount
increment. Therefore, no pincount checks are required.
close to true, nor do I understand how it could be done, since we don't
know which pages in a folio are pinned.
As the description correctly says: "Returns -EBUSY if the page is pinned".
If that is not true, we'd have a real issue.
I think we have to prohibit splits of folios that are GUP pinned.
In split_huge_page_to_list(), we make sure there are no additional folio
references of any kind (GUP pin, whatsoever).
can_split_folio() is racy but catches most of that. Then, we do the
folio_ref_freeze().
So I don't see how that could ever work with additional folio references
(including GUP pins). Unless serious BUG somewhere else.
In essence: we expect on a folio after completely unmapping it:
* 1 reference from the caller of split_huge_page_to_list()
* pagecache: 1 reference per subpage from the pagecache
* anon: 1 reference per subage from the swapcache if in the swapcache
Any additional reference would lead to a split failure.
We're holding the folio lock, so for anon folios we cannot remove it
from the swapcache concurrently.
For pagecache folios ... dunno :) I expect some folio-lock magic as well.
Reading "I have a test case with poisoned shmem THP page that was
mlocked and GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded."
If that is indeed true, I assume that page poisoning might have done
something very wrong with the large folio: for example, partially unmap
it from the pagecache (if that's even possible?) or accidentally drop a
folio reference. Then, we'd be missing to detecting the GUP pin when
freezing the refcount.
... any chance we can get the reproducer? [reading this mail from Willy
only]
--
Cheers,
David / dhildenb