On Fri, Sep 27, 2019 at 2:59 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > On 9/26/19 5:55 PM, Mina Almasry wrote: > > Provided we keep the existing controller untouched, should the new > > controller track: > > > > 1. only reservations, or > > 2. both reservations and allocations for which no reservations exist > > (such as the MAP_NORESERVE case)? > > > > I like the 'both' approach. Seems to me a counter like that would work > > automatically regardless of whether the application is allocating > > hugetlb memory with NORESERVE or not. NORESERVE allocations cannot cut > > into reserved hugetlb pages, correct? > > Correct. One other easy way to allocate huge pages without reserves > (that I know is used today) is via the fallocate system call. > > > If so, then applications that > > allocate with NORESERVE will get sigbused when they hit their limit, > > and applications that allocate without NORESERVE may get an error at > > mmap time but will always be within their limits while they access the > > mmap'd memory, correct? > > Correct. At page allocation time we can easily check to see if a reservation > exists and not charge. For any specific page within a hugetlbfs file, > a charge would happen at mmap time or allocation time. > > One exception (that I can think of) to this mmap(RESERVE) will not cause > a SIGBUS rule is in the case of hole punch. If someone punches a hole in > a file, not only do they remove pages associated with the file but the > reservation information as well. Therefore, a subsequent fault will be > the same as an allocation without reservation. > I don't think it causes a sigbus. This is the scenario, right: 1. Make cgroup with limit X bytes. 2. Task in cgroup mmaps a file with X bytes, causing the cgroup to get charged 3. A hole of size Y is punched in the file, causing the cgroup to get uncharged Y bytes. 4. The task faults in memory from the hole, getting charged up to Y bytes again. But they will be still within their limits. IIUC userspace only gets sigbus'd if the limit is lowered between steps 3 and 4, and it's ok if it gets sigbus'd there in my opinion. > I 'think' the code to remove/truncate a file will work corrctly as it > is today, but I need to think about this some more. > > > mmap'd memory, correct? So the 'both' counter seems like a one size > > fits all. > > > > I think the only sticking point left is whether an added controller > > can support both cgroup-v2 and cgroup-v1. If I could get confirmation > > on that I'll provide a patchset. > > Sorry, but I can not provide cgroup expertise. > -- > Mike Kravetz