Thanks. I'm hoping that the *.rsvd.limit_in_bytes cgroup settings will suffice if we can upgrade the production systems to a kernel new enough to have them, and mlock() is a possibility as well, but it's useful to know about the other options. Mostly I wanted to know whether this was a kernel problem or a documentation problem. I've filed https://bugzilla.kernel.org/show_bug.cgi?id=212153 for the man page.
On Tue, 9 Mar 2021 at 12:32, David Hildenbrand <david@xxxxxxxxxx> wrote:
On 09.03.21 10:33, Bruce Merry wrote:
> Hi
>
> I've run into a problem with using mmap(..., MAP_ANONYMOUS |
> MAP_POPULATE | MAP_HUGETLB). If there are no huge pages available due to
> vm.nr_hugepages (or hugetlb.2MB.rsvd.limit_in_bytes cgroup setting) then
> the mmap call fails and I can gracefully fall back to 4KB pages.
> However, if neither of the above apply but hugetlb.2MB.limit_in_bytes
> prevents pages being mapped, then it appears that MAP_POPULATE is
> silently ignored (according to mincore), and rather than being able to
> gracefully fall back, attempting to use the memory results in SIGBUS.
I would have imagined that the hugepage reservation would fail. But
looks like they might get reserved, however, actual population is
restricted using cgroups later.
Huge page reservation is actually pretty weird in some special cases
(including NUMA bindings).
>
> Is that expected behaviour? I don't see anything in the mmap(2) man page
> about it being best-effort (in contrast to MAP_LOCKED, which explicitly
> says the call won't fail if it can't lock the memory).
I think it has been best-effort forever, just like MAP_LOCKED.
You could use memfd_create() to create an anonymous file backed by huge
pages, then try allocating backend storage using fallocate() - which
fails in a safe way. You just have to make sure to map it MAP_SHARED
later to avoid nasty side effects with private mappings + fallocate().
>
> This is on Linux 5.8 on Ubuntu 20.04. I can provide sample code if it's
> of interest, or test on a newer kernel if it'll help.
>
Note that I'm working on a reliable populate mechanism that can also
work on parts of a mapping only, especially relevant in combination with
MAP_NORESERVE. Not sure if that applies to your use case, sounds like
memfd_create() +fallocate() could be good enough - unless you also
really want to have all page tables properly populated already or really
need MAP_PRIVATE.
https://lkml.kernel.org/r/20210308164520.18323-1-david@xxxxxxxxxx
--
Thanks,
David / dhildenb
--
Bruce Merry
Senior Science Processing Developer
SARAO