On 11/02/23 23:15, Rik van Riel wrote: > On Thu, 2023-11-02 at 19:37 -0700, Mike Kravetz wrote: > > On 11/02/23 19:24, Mike Kravetz wrote: > > That qualification '(with resv_map)' caught my attention originally, > > and > > I thought about it again while looking into this. We now cover the > > common > > cases, but there are still quite a few cases where resv_map is NULL > > for > > private mappings. In such cases, the race between MADV_DONTNEED and > > page > > fault still exists. Is that a concern? > > Honestly, I'm not sure. In hugetlb_dup_vma_private, which is > called at fork time, we have this comment: > > * - For MAP_PRIVATE mappings, this is the reserve map which > does > * not apply to children. Faults generated by the children > are > * not guaranteed to succeed, even if read-only. > > That suggests we already have no guarantee of faults > succeeding after fork. Right! > > > > > With a bit more work we 'could' make sure every hugetlb vma has a > > lock > > to participate in this scheme. > > > > Any thhoughts? > > We can certainly close the race between MADV_DONTNEED > and page faults for MAP_PRIVATE mappings in child processes, > but that does not guarantee that we actually have hugetlb > pages for those processes. > > In short, I'm not sure :) I sort of remember something Dave Hansen added years ago to help a customer allocating LOTs of hugetlb pages dynamically. I seem to recall that this was to get better numa locality. As a result, they did not use reservations. I guess it sticks with me because it was/is a real example of a customer choosing NOT to use reservations. I don't have any evidence that this is common. My thought is to leave it as is until someone complains. -- Mike Kravetz