Hi, Oscar, On Thu, Jan 16, 2025 at 11:15:09AM +0100, Oscar Salvador wrote: > Now, let me see if I get this straight: > > 1) parent maps a hugetlb page, but not yet fault in. > 2) forks, child faults-in the page Agreed until here. > 3) child doesn't have any reservation, when 'cow_from_owner' set to true > we check whether we have a spare hugetlb page to satisfy that When the child fault in, it should trigger the hugetlb CoW fault, in which case it should set 'cow_from_owner' to false (rather than true) always, because it's not a CoW from owner (the child is not an owner). See the check in the fault path: if (is_vma_resv_set(vma, HPAGE_RESV_OWNER) && old_folio != pagecache_folio) cow_from_owner = true; Here I would expect when the child faults, the 1st OWNER check failed. > 4) parent faults in the page > 5) we do not have spare hugetlb pages, so we 'steal' it from the child > with unmap_ref_private. Agreed on 4/5. At last step 5, above check will become true, hence this is the place where the allocation will have cow_from_owner set to true. If we see the difference at step 3/5, that's also exactly why I renamed the variable for the whole stack: it represents this special condition from the top layer (fault) until the allocation layer, saying explicitly when it should be set true (only "cow", from the "owner" not child), rather than a very blurred idea of someone trying to avoid_reserve for whatever reason. The hope is it made that niche path very clear in the allocation path, and it discourage any other user using this flag which can be abuse (and cause the allocation path harder to follow in general). Thanks, -- Peter Xu