Re: Splitting pinned folios

David Hildenbrand <david@xxxxxxxxxx> · Wed, 13 Mar 2024 10:20:46 +0100

On 13.03.24 04:16, Matthew Wilcox wrote:
On Tue, Mar 12, 2024 at 06:23:43PM -0700, Jane Chu wrote:
I noticed this recently

OK, this is entirely different, so I'm going to start a new thread ;-)

  * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
  * they are not mapped.
  *
  * Returns 0 if the hugepage is split successfully.
  * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
  * us.
  */
int split_huge_page_to_list(struct page *page, struct list_head *list)
{

I have a test case with poisoned shmem THP page that was mlocked and

GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.

I'm going to blame John for this!  

The description is wrong. Whoever calls split_huge_page_to_list() must 
hold a folio reference.

That folio reference will be transferred to @page (not the head page) 
once split. So @page can be used by the caller after the split succeeded.

There's no reference to pincount
anywhere in huge_memory.c, so I have no clue how this comment is even

Each pincount increment/decrement must be paired with a folio refcount 
increment. Therefore, no pincount checks are required.

close to true, nor do I understand how it could be done, since we don't
know which pages in a folio are pinned.

As the description correctly says: "Returns -EBUSY if the page is pinned".

If that is not true, we'd have a real issue.

I think we have to prohibit splits of folios that are GUP pinned.

In split_huge_page_to_list(), we make sure there are no additional folio 
references of any kind (GUP pin, whatsoever).

can_split_folio() is racy but catches most of that. Then, we do the 
folio_ref_freeze().

So I don't see how that could ever work with additional folio references 
(including GUP pins). Unless serious BUG somewhere else.

In essence: we expect on a folio after completely unmapping it:
* 1 reference from the caller of split_huge_page_to_list()
* pagecache: 1 reference per subpage from the pagecache
* anon: 1 reference per subage from the swapcache if in the swapcache

Any additional reference would lead to a split failure.

We're holding the folio lock, so for anon folios we cannot remove it 
from the swapcache concurrently.

For pagecache folios ... dunno :) I expect some folio-lock magic as well.

Reading "I have a test case with poisoned shmem THP page that was 
mlocked and GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded."

If that is indeed true, I assume that page poisoning might have done 
something very wrong with the large folio: for example, partially unmap 
it from the pagecache (if that's even possible?) or accidentally drop a 
folio reference. Then, we'd be missing to detecting the GUP pin when 
freezing the refcount.

... any chance we can get the reproducer? [reading this mail from Willy 
only]

--
Cheers,

David / dhildenb