On 8/9/19 1:57 PM, Mina Almasry wrote: > On Fri, Aug 9, 2019 at 1:39 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: >> >> On 8/9/19 11:05 AM, Mina Almasry wrote: >>> On Fri, Aug 9, 2019 at 4:27 AM Michal Koutný <mkoutny@xxxxxxxx> wrote: >>>>> Alternatives considered: >>>>> [...] >>>> (I did not try that but) have you considered: >>>> 3) MAP_POPULATE while you're making the reservation, >>> >>> I have tried this, and the behaviour is not great. Basically if >>> userspace mmaps more memory than its cgroup limit allows with >>> MAP_POPULATE, the kernel will reserve the total amount requested by >>> the userspace, it will fault in up to the cgroup limit, and then it >>> will SIGBUS the task when it tries to access the rest of its >>> 'reserved' memory. >>> >>> So for example: >>> - if /proc/sys/vm/nr_hugepages == 10, and >>> - your cgroup limit is 5 pages, and >>> - you mmap(MAP_POPULATE) 7 pages. >>> >>> Then the kernel will reserve 7 pages, and will fault in 5 of those 7 >>> pages, and will SIGBUS you when you try to access the remaining 2 >>> pages. So the problem persists. Folks would still like to know they >>> are crossing the limits on mmap time. >> >> If you got the failure at mmap time in the MAP_POPULATE case would this >> be useful? >> >> Just thinking that would be a relatively simple change. > > Not quite, unfortunately. A subset of the folks that want to use > hugetlb memory, don't want to use MAP_POPULATE (IIRC, something about > mmaping a huge amount of hugetlb memory at their jobs' startup, and > doing that with MAP_POPULATE adds so much to their startup time that > it is prohibitively expensive - but that's just what I vaguely recall > offhand. I can get you the details if you're interested). Yes, MAP_POPULATE can get expensive as you will need to zero all those huge pages. -- Mike Kravetz