On 8/9/19 12:42 PM, Mina Almasry wrote: > On Fri, Aug 9, 2019 at 10:54 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: >> On 8/8/19 4:13 PM, Mina Almasry wrote: >>> Problem: >>> Currently tasks attempting to allocate more hugetlb memory than is available get >>> a failure at mmap/shmget time. This is thanks to Hugetlbfs Reservations [1]. >>> However, if a task attempts to allocate hugetlb memory only more than its >>> hugetlb_cgroup limit allows, the kernel will allow the mmap/shmget call, >>> but will SIGBUS the task when it attempts to fault the memory in. <snip> >> I believe tracking reservations for shared mappings can get quite complicated. >> The hugetlbfs reservation code around shared mappings 'works' on the basis >> that shared mapping reservations are global. As a result, reservations are >> more associated with the inode than with the task making the reservation. > > FWIW, I found it not too bad. And my tests at least don't detect an > anomaly around shared mappings. The key I think is that I'm tracking > cgroup to uncharge on the file_region entry inside the resv_map, so we > know who allocated each file_region entry exactly and we can uncharge > them when the entry is region_del'd. > >> For example, consider a file of size 4 hugetlb pages. >> Task A maps the first 2 pages, and 2 reservations are taken. Task B maps >> all 4 pages, and 2 additional reservations are taken. I am not really sure >> of the desired semantics here for reservation limits if A and B are in separate >> cgroups. Should B be charged for 4 or 2 reservations? > > Task A's cgroup is charged 2 pages to its reservation usage. > Task B's cgroup is charged 2 pages to its reservation usage. OK, Suppose Task B's cgroup allowed 2 huge pages reservation and 2 huge pages allocation. The mmap would succeed, but Task B could potentially need to allocate more than 2 huge pages. So, when faulting in more than 2 huge pages B would get a SIGBUS. Correct? Or, am I missing something? Perhaps reservation charge should always be the same as map size/maximum allocation size? -- Mike Kravetz