Hi Michael It seems we all miss RLIMIT_MEMLOCK. "this limit instead governs the amount of memory that an unprivileged process may lock." I found this because someone has sent a patch to ltp fix this unexpected error problem. https://patchwork.ozlabs.org/project/ltp/patch/20210706132114.204443-1-cascardo@xxxxxxxxxxxxx/ Best Regards Yang Xu > Hi Michael > Your patch is more clear, it looks good to me. > > Best Regards > Yang Xu >> Hello Yang Xu, >> >> On 5/12/21 10:53 PM, Yang Xu wrote: >>> hugetlb_shm_group contains group id that is allowed to create SysV >>> shared >>> memory segment using hugetlb page. To meet EPERM error, we also >>> need to make group id be not in this proc file. >>> >>> Signed-off-by: Yang Xu<xuyang2018.jy@xxxxxxxxxxx> >>> --- >>> man2/shmget.2 | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/man2/shmget.2 b/man2/shmget.2 >>> index 757b7b7f1..29799b9b8 100644 >>> --- a/man2/shmget.2 >>> +++ b/man2/shmget.2 >>> @@ -273,7 +273,7 @@ The >>> .B SHM_HUGETLB >>> flag was specified, but the caller was not privileged (did not have the >>> .B CAP_IPC_LOCK >>> -capability). >>> +capability and group id doesn't be contained in hugetlb_shm_group >>> proc file). >>> .SH CONFORMING TO >>> POSIX.1-2001, POSIX.1-2008, SVr4. >>> .\" SVr4 documents an additional error condition EEXIST. >> >> Thanks for spotting this. The story is more complex, as far as I can >> tell. For example, the same error also occurs for mmap(2) and >> memfd_create(2) >> >> Instead of your patch, I applied the diff below (not yet pushed), >> based on my reading of fs/hugetlbfs/inode.c, in particular: >> >> static int can_do_hugetlb_shm(void) >> { >> kgid_t shm_group; >> shm_group = make_kgid(&init_user_ns, sysctl_hugetlb_shm_group); >> return capable(CAP_IPC_LOCK) || in_group_p(shm_group); >> } >> >> ... >> >> struct file *hugetlb_file_setup(const char *name, size_t size, >> vm_flags_t acctflag, struct user_struct **user, >> int creat_flags, int page_size_log) >> { >> ... >> if (creat_flags == HUGETLB_SHMFS_INODE&& !can_do_hugetlb_shm()) { >> *user = current_user(); >> if (user_shm_lock(size, *user)) { >> task_lock(current); >> pr_warn_once("%s (%d): Using mlock ulimits for SHM_HUGETLB is >> deprecated\n", >> current->comm, current->pid); >> task_unlock(current); >> } else { >> *user = NULL; >> return ERR_PTR(-EPERM); >> } >> } >> ... >> } >> >> As a deprecated feature, it appears that the RLIMIT_MEMLOCK >> can also be used to permit huge page allocation, but I have >> chose not to document that for now. >> >> Please let me know if the patch makes sense to you. >> >> With best regards, >> >> Michael >> >> --- a/man2/memfd_create.2 >> +++ b/man2/memfd_create.2 >> @@ -201,6 +201,19 @@ The >> .BR memfd_create () >> system call first appeared in Linux 3.17; >> glibc support was added in version 2.27. >> +.TP >> +.B EPERM >> +The >> +.B MFD_HUGETLB >> +flag was specified, but the caller was not privileged (did not have the >> +.B CAP_IPC_LOCK >> +capability) >> +and is not a member of the >> +.I sysctl_hugetlb_shm_group >> +group; see the description of >> +.I /proc/sys/vm/sysctl_hugetlb_shm_group >> +in >> +.BR proc (5). >> .SH CONFORMING TO >> The >> .BR memfd_create () >> diff --git a/man2/mmap.2 b/man2/mmap.2 >> index 03f2eeb2c..4ee2f4f96 100644 >> --- a/man2/mmap.2 >> +++ b/man2/mmap.2 >> @@ -628,6 +628,18 @@ was mounted no-exec. >> The operation was prevented by a file seal; see >> .BR fcntl (2). >> .TP >> +.B EPERM >> +The >> +.B MAP_HUGETLB >> +flag was specified, but the caller was not privileged (did not have the >> +.B CAP_IPC_LOCK >> +capability) >> +and is not a member of the >> +.I sysctl_hugetlb_shm_group >> +group; see the description of >> +.I /proc/sys/vm/sysctl_hugetlb_shm_group >> +in >> +.TP >> .B ETXTBSY >> .B MAP_DENYWRITE >> was set but the object specified by >> diff --git a/man2/shmget.2 b/man2/shmget.2 >> index 757b7b7f1..6e9995e81 100644 >> --- a/man2/shmget.2 >> +++ b/man2/shmget.2 >> @@ -273,7 +273,13 @@ The >> .B SHM_HUGETLB >> flag was specified, but the caller was not privileged (did not have the >> .B CAP_IPC_LOCK >> -capability). >> +capability) >> +and is not a member of the >> +.I sysctl_hugetlb_shm_group >> +group; see the description of >> +.I /proc/sys/vm/sysctl_hugetlb_shm_group >> +in >> +.BR proc (5). >> .SH CONFORMING TO >> POSIX.1-2001, POSIX.1-2008, SVr4. >> .\" SVr4 documents an additional error condition EEXIST. >> diff --git a/man5/proc.5 b/man5/proc.5 >> index a28dbdcc7..888535449 100644 >> --- a/man5/proc.5 >> +++ b/man5/proc.5 >> @@ -5603,6 +5603,19 @@ user should run >> .BR sync (1) >> first. >> .TP >> +.IR /proc/sys/vm/sysctl_hugetlb_shm_group " (since Linux 2.6.7)" >> +This writable file contains a group ID that is allowed >> +to allocate memory using huge pages. >> +If a process has a filesystem group ID or any supplememtary group ID >> that >> +matches this group ID, >> +then it can make huge-page allocations without holding the >> +.BR CAP_IPC_LOCK >> +capability; see >> +.BR memfd_create (2), >> +.BR mmap (2), >> +and >> +.BR shmget (2). >> +.TP >> .IR /proc/sys/vm/legacy_va_layout " (since Linux 2.6.9)" >> .\" The following is from Documentation/filesystems/proc.txt >> If nonzero, this disables the new 32-bit memory-mapping layout; >> diff --git a/man7/capabilities.7 b/man7/capabilities.7 >> index 7e79b2fb6..cf9dc190f 100644 >> --- a/man7/capabilities.7 >> +++ b/man7/capabilities.7 >> @@ -205,11 +205,21 @@ the filesystem or any of the supplementary GIDs >> of the calling process. >> .B CAP_IPC_LOCK >> .\" FIXME . As at Linux 3.2, there are some strange uses of this >> capability >> .\" in other places; they probably should be replaced with something >> else. >> +.PD 0 >> +.RS >> +.IP * 2 >> Lock memory >> .RB ( mlock (2), >> .BR mlockall (2), >> .BR mmap (2), >> +.BR shmctl (2)); >> +.IP * >> +Allocate memory using huge pages >> +.RB ( memfd_create (2) >> +.BR mmap (2), >> .BR shmctl (2)). >> +.PD 0 >> +.RE >> .TP >> .B CAP_IPC_OWNER >> Bypass permission checks for operations on System V IPC objects. >> $ >> >> >> >