Hi Michael Your patch is more clear, it looks good to me. Best Regards Yang Xu > Hello Yang Xu, > > On 5/12/21 10:53 PM, Yang Xu wrote: >> hugetlb_shm_group contains group id that is allowed to create SysV shared >> memory segment using hugetlb page. To meet EPERM error, we also >> need to make group id be not in this proc file. >> >> Signed-off-by: Yang Xu<xuyang2018.jy@xxxxxxxxxxx> >> --- >> man2/shmget.2 | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/man2/shmget.2 b/man2/shmget.2 >> index 757b7b7f1..29799b9b8 100644 >> --- a/man2/shmget.2 >> +++ b/man2/shmget.2 >> @@ -273,7 +273,7 @@ The >> .B SHM_HUGETLB >> flag was specified, but the caller was not privileged (did not have the >> .B CAP_IPC_LOCK >> -capability). >> +capability and group id doesn't be contained in hugetlb_shm_group proc file). >> .SH CONFORMING TO >> POSIX.1-2001, POSIX.1-2008, SVr4. >> .\" SVr4 documents an additional error condition EEXIST. > > Thanks for spotting this. The story is more complex, as far as I can > tell. For example, the same error also occurs for mmap(2) and > memfd_create(2) > > Instead of your patch, I applied the diff below (not yet pushed), > based on my reading of fs/hugetlbfs/inode.c, in particular: > > static int can_do_hugetlb_shm(void) > { > kgid_t shm_group; > shm_group = make_kgid(&init_user_ns, sysctl_hugetlb_shm_group); > return capable(CAP_IPC_LOCK) || in_group_p(shm_group); > } > > ... > > struct file *hugetlb_file_setup(const char *name, size_t size, > vm_flags_t acctflag, struct user_struct **user, > int creat_flags, int page_size_log) > { > ... > if (creat_flags == HUGETLB_SHMFS_INODE&& !can_do_hugetlb_shm()) { > *user = current_user(); > if (user_shm_lock(size, *user)) { > task_lock(current); > pr_warn_once("%s (%d): Using mlock ulimits for SHM_HUGETLB is deprecated\n", > current->comm, current->pid); > task_unlock(current); > } else { > *user = NULL; > return ERR_PTR(-EPERM); > } > } > ... > } > > As a deprecated feature, it appears that the RLIMIT_MEMLOCK > can also be used to permit huge page allocation, but I have > chose not to document that for now. > > Please let me know if the patch makes sense to you. > > With best regards, > > Michael > > --- a/man2/memfd_create.2 > +++ b/man2/memfd_create.2 > @@ -201,6 +201,19 @@ The > .BR memfd_create () > system call first appeared in Linux 3.17; > glibc support was added in version 2.27. > +.TP > +.B EPERM > +The > +.B MFD_HUGETLB > +flag was specified, but the caller was not privileged (did not have the > +.B CAP_IPC_LOCK > +capability) > +and is not a member of the > +.I sysctl_hugetlb_shm_group > +group; see the description of > +.I /proc/sys/vm/sysctl_hugetlb_shm_group > +in > +.BR proc (5). > .SH CONFORMING TO > The > .BR memfd_create () > diff --git a/man2/mmap.2 b/man2/mmap.2 > index 03f2eeb2c..4ee2f4f96 100644 > --- a/man2/mmap.2 > +++ b/man2/mmap.2 > @@ -628,6 +628,18 @@ was mounted no-exec. > The operation was prevented by a file seal; see > .BR fcntl (2). > .TP > +.B EPERM > +The > +.B MAP_HUGETLB > +flag was specified, but the caller was not privileged (did not have the > +.B CAP_IPC_LOCK > +capability) > +and is not a member of the > +.I sysctl_hugetlb_shm_group > +group; see the description of > +.I /proc/sys/vm/sysctl_hugetlb_shm_group > +in > +.TP > .B ETXTBSY > .B MAP_DENYWRITE > was set but the object specified by > diff --git a/man2/shmget.2 b/man2/shmget.2 > index 757b7b7f1..6e9995e81 100644 > --- a/man2/shmget.2 > +++ b/man2/shmget.2 > @@ -273,7 +273,13 @@ The > .B SHM_HUGETLB > flag was specified, but the caller was not privileged (did not have the > .B CAP_IPC_LOCK > -capability). > +capability) > +and is not a member of the > +.I sysctl_hugetlb_shm_group > +group; see the description of > +.I /proc/sys/vm/sysctl_hugetlb_shm_group > +in > +.BR proc (5). > .SH CONFORMING TO > POSIX.1-2001, POSIX.1-2008, SVr4. > .\" SVr4 documents an additional error condition EEXIST. > diff --git a/man5/proc.5 b/man5/proc.5 > index a28dbdcc7..888535449 100644 > --- a/man5/proc.5 > +++ b/man5/proc.5 > @@ -5603,6 +5603,19 @@ user should run > .BR sync (1) > first. > .TP > +.IR /proc/sys/vm/sysctl_hugetlb_shm_group " (since Linux 2.6.7)" > +This writable file contains a group ID that is allowed > +to allocate memory using huge pages. > +If a process has a filesystem group ID or any supplememtary group ID that > +matches this group ID, > +then it can make huge-page allocations without holding the > +.BR CAP_IPC_LOCK > +capability; see > +.BR memfd_create (2), > +.BR mmap (2), > +and > +.BR shmget (2). > +.TP > .IR /proc/sys/vm/legacy_va_layout " (since Linux 2.6.9)" > .\" The following is from Documentation/filesystems/proc.txt > If nonzero, this disables the new 32-bit memory-mapping layout; > diff --git a/man7/capabilities.7 b/man7/capabilities.7 > index 7e79b2fb6..cf9dc190f 100644 > --- a/man7/capabilities.7 > +++ b/man7/capabilities.7 > @@ -205,11 +205,21 @@ the filesystem or any of the supplementary GIDs of the calling process. > .B CAP_IPC_LOCK > .\" FIXME . As at Linux 3.2, there are some strange uses of this capability > .\" in other places; they probably should be replaced with something else. > +.PD 0 > +.RS > +.IP * 2 > Lock memory > .RB ( mlock (2), > .BR mlockall (2), > .BR mmap (2), > +.BR shmctl (2)); > +.IP * > +Allocate memory using huge pages > +.RB ( memfd_create (2) > +.BR mmap (2), > .BR shmctl (2)). > +.PD 0 > +.RE > .TP > .B CAP_IPC_OWNER > Bypass permission checks for operations on System V IPC objects. > $ > > >