Hi,
The documentation for both `man 2 mmap` and `man 2 shmget` states that
anonymous hugepage mappings can only be obtained if the caller has
`CAP_IPC_LOCK`, or is part of the `sysctl_hugetlb_shm_group`.
I believe that in practice this is only enforced for `shmget`, and not `mmap`.
This is true as of current master, and has been true at least since 4.x (I have
not looked further back).
`mm/mmap.c` contains, in a `MAP_HUGETLB` branch:
file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
VM_NORESERVE,
HUGETLB_ANONHUGE_INODE,
(flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
While `fs/hugetlbfs/inode.c` contains:
static int can_do_hugetlb_shm(void)
{
kgid_t shm_group;
shm_group = make_kgid(&init_user_ns, sysctl_hugetlb_shm_group);
return capable(CAP_IPC_LOCK) || in_group_p(shm_group);
}
...
struct file *hugetlb_file_setup(const char *name, size_t size,
vm_flags_t acctflag, int creat_flags,
int page_size_log)
{
...
if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) {
...
return ERR_PTR(-EPERM);
}
i.e., only checks `can_do_hugetlb_shm` when `create_flags ==
HUGETLB_SHMFS_INODE`, whereas the callsite in `mm/mmap.c` passes in
`HUGETLB_ANONHUGE_INODE`.
A simple test program that tries allocating hugepage memory with `mmap` and
`shmget` while not possessing `CAP_IPC_LOCK` and not being in the
`sysctl_hugetlb_shm_group` confirms this behavior.
What's the right course of action here?
- The logic in `hugetlb_file_setup` could be modified to enforce the
permissions on `mmap` calls. This might break userspace code that's been
relying on this working, though.
- The restriction could be removed from `shmget`.
- The inconsistency between `mmap` and `shmget` could be accepted as a fact of
life, and the documentation fixed to match this reality.
-- Tudor
The documentation for both `man 2 mmap` and `man 2 shmget` states that
anonymous hugepage mappings can only be obtained if the caller has
`CAP_IPC_LOCK`, or is part of the `sysctl_hugetlb_shm_group`.
I believe that in practice this is only enforced for `shmget`, and not `mmap`.
This is true as of current master, and has been true at least since 4.x (I have
not looked further back).
`mm/mmap.c` contains, in a `MAP_HUGETLB` branch:
file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
VM_NORESERVE,
HUGETLB_ANONHUGE_INODE,
(flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
While `fs/hugetlbfs/inode.c` contains:
static int can_do_hugetlb_shm(void)
{
kgid_t shm_group;
shm_group = make_kgid(&init_user_ns, sysctl_hugetlb_shm_group);
return capable(CAP_IPC_LOCK) || in_group_p(shm_group);
}
...
struct file *hugetlb_file_setup(const char *name, size_t size,
vm_flags_t acctflag, int creat_flags,
int page_size_log)
{
...
if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) {
...
return ERR_PTR(-EPERM);
}
i.e., only checks `can_do_hugetlb_shm` when `create_flags ==
HUGETLB_SHMFS_INODE`, whereas the callsite in `mm/mmap.c` passes in
`HUGETLB_ANONHUGE_INODE`.
A simple test program that tries allocating hugepage memory with `mmap` and
`shmget` while not possessing `CAP_IPC_LOCK` and not being in the
`sysctl_hugetlb_shm_group` confirms this behavior.
What's the right course of action here?
- The logic in `hugetlb_file_setup` could be modified to enforce the
permissions on `mmap` calls. This might break userspace code that's been
relying on this working, though.
- The restriction could be removed from `shmget`.
- The inconsistency between `mmap` and `shmget` could be accepted as a fact of
life, and the documentation fixed to match this reality.
-- Tudor