On 9/26/19 5:55 PM, Mina Almasry wrote: > Provided we keep the existing controller untouched, should the new > controller track: > > 1. only reservations, or > 2. both reservations and allocations for which no reservations exist > (such as the MAP_NORESERVE case)? > > I like the 'both' approach. Seems to me a counter like that would work > automatically regardless of whether the application is allocating > hugetlb memory with NORESERVE or not. NORESERVE allocations cannot cut > into reserved hugetlb pages, correct? Correct. One other easy way to allocate huge pages without reserves (that I know is used today) is via the fallocate system call. > If so, then applications that > allocate with NORESERVE will get sigbused when they hit their limit, > and applications that allocate without NORESERVE may get an error at > mmap time but will always be within their limits while they access the > mmap'd memory, correct? Correct. At page allocation time we can easily check to see if a reservation exists and not charge. For any specific page within a hugetlbfs file, a charge would happen at mmap time or allocation time. One exception (that I can think of) to this mmap(RESERVE) will not cause a SIGBUS rule is in the case of hole punch. If someone punches a hole in a file, not only do they remove pages associated with the file but the reservation information as well. Therefore, a subsequent fault will be the same as an allocation without reservation. I 'think' the code to remove/truncate a file will work corrctly as it is today, but I need to think about this some more. > mmap'd memory, correct? So the 'both' counter seems like a one size > fits all. > > I think the only sticking point left is whether an added controller > can support both cgroup-v2 and cgroup-v1. If I could get confirmation > on that I'll provide a patchset. Sorry, but I can not provide cgroup expertise. -- Mike Kravetz