Re: [PATCH v2] man2/shmget2: Add details about EPERM error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Yang Xu,

On 5/12/21 10:53 PM, Yang Xu wrote:
> hugetlb_shm_group contains group id that is allowed to create SysV shared
> memory segment using hugetlb page. To meet EPERM error, we also
> need to make group id be not in this proc file.
> 
> Signed-off-by: Yang Xu <xuyang2018.jy@xxxxxxxxxxx>
> ---
>  man2/shmget.2 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/man2/shmget.2 b/man2/shmget.2
> index 757b7b7f1..29799b9b8 100644
> --- a/man2/shmget.2
> +++ b/man2/shmget.2
> @@ -273,7 +273,7 @@ The
>  .B SHM_HUGETLB
>  flag was specified, but the caller was not privileged (did not have the
>  .B CAP_IPC_LOCK
> -capability).
> +capability and group id doesn't be contained in hugetlb_shm_group proc file).
>  .SH CONFORMING TO
>  POSIX.1-2001, POSIX.1-2008, SVr4.
>  .\" SVr4 documents an additional error condition EEXIST.

Thanks for spotting this. The story is more complex, as far as I can
tell. For example, the same error also occurs for mmap(2) and 
memfd_create(2)

Instead of your patch, I applied the diff below (not yet pushed), 
based on my reading of fs/hugetlbfs/inode.c, in particular:
    
    static int can_do_hugetlb_shm(void)
    {
            kgid_t shm_group;
            shm_group = make_kgid(&init_user_ns, sysctl_hugetlb_shm_group);
            return capable(CAP_IPC_LOCK) || in_group_p(shm_group);
    }
    
    ...
    
    struct file *hugetlb_file_setup(const char *name, size_t size,
                                    vm_flags_t acctflag, struct user_struct **user,
                                    int creat_flags, int page_size_log)
    {
            ...
            if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) {
                    *user = current_user();
                    if (user_shm_lock(size, *user)) {
                            task_lock(current);
                            pr_warn_once("%s (%d): Using mlock ulimits for SHM_HUGETLB is deprecated\n",
                                    current->comm, current->pid);
                            task_unlock(current);
                    } else {
                            *user = NULL;
                            return ERR_PTR(-EPERM);
                    }
            }
            ...
    }

As a deprecated feature, it appears that the RLIMIT_MEMLOCK
can also be used to permit huge page allocation, but I have
chose not to document that for now.

Please let me know if the patch makes sense to you.

With best regards,

Michael

--- a/man2/memfd_create.2
+++ b/man2/memfd_create.2
@@ -201,6 +201,19 @@ The
 .BR memfd_create ()
 system call first appeared in Linux 3.17;
 glibc support was added in version 2.27.
+.TP
+.B EPERM
+The
+.B MFD_HUGETLB
+flag was specified, but the caller was not privileged (did not have the
+.B CAP_IPC_LOCK
+capability)
+and is not a member of the
+.I sysctl_hugetlb_shm_group
+group; see the description of
+.I /proc/sys/vm/sysctl_hugetlb_shm_group
+in
+.BR proc (5).
 .SH CONFORMING TO
 The
 .BR memfd_create ()
diff --git a/man2/mmap.2 b/man2/mmap.2
index 03f2eeb2c..4ee2f4f96 100644
--- a/man2/mmap.2
+++ b/man2/mmap.2
@@ -628,6 +628,18 @@ was mounted no-exec.
 The operation was prevented by a file seal; see
 .BR fcntl (2).
 .TP
+.B EPERM
+The
+.B MAP_HUGETLB
+flag was specified, but the caller was not privileged (did not have the
+.B CAP_IPC_LOCK
+capability)
+and is not a member of the
+.I sysctl_hugetlb_shm_group
+group; see the description of
+.I /proc/sys/vm/sysctl_hugetlb_shm_group
+in
+.TP
 .B ETXTBSY
 .B MAP_DENYWRITE
 was set but the object specified by
diff --git a/man2/shmget.2 b/man2/shmget.2
index 757b7b7f1..6e9995e81 100644
--- a/man2/shmget.2
+++ b/man2/shmget.2
@@ -273,7 +273,13 @@ The
 .B SHM_HUGETLB
 flag was specified, but the caller was not privileged (did not have the
 .B CAP_IPC_LOCK
-capability).
+capability)
+and is not a member of the
+.I sysctl_hugetlb_shm_group
+group; see the description of
+.I /proc/sys/vm/sysctl_hugetlb_shm_group
+in
+.BR proc (5).
 .SH CONFORMING TO
 POSIX.1-2001, POSIX.1-2008, SVr4.
 .\" SVr4 documents an additional error condition EEXIST.
diff --git a/man5/proc.5 b/man5/proc.5
index a28dbdcc7..888535449 100644
--- a/man5/proc.5
+++ b/man5/proc.5
@@ -5603,6 +5603,19 @@ user should run
 .BR sync (1)
 first.
 .TP
+.IR  /proc/sys/vm/sysctl_hugetlb_shm_group " (since Linux 2.6.7)"
+This writable file contains a group ID that is allowed
+to allocate memory using huge pages.
+If a process has a filesystem group ID or any supplememtary group ID that
+matches this group ID,
+then it can make huge-page allocations without holding the
+.BR CAP_IPC_LOCK
+capability; see
+.BR memfd_create (2),
+.BR mmap (2),
+and
+.BR shmget (2).
+.TP
 .IR /proc/sys/vm/legacy_va_layout " (since Linux 2.6.9)"
 .\" The following is from Documentation/filesystems/proc.txt
 If nonzero, this disables the new 32-bit memory-mapping layout;
diff --git a/man7/capabilities.7 b/man7/capabilities.7
index 7e79b2fb6..cf9dc190f 100644
--- a/man7/capabilities.7
+++ b/man7/capabilities.7
@@ -205,11 +205,21 @@ the filesystem or any of the supplementary GIDs of the calling process.
 .B CAP_IPC_LOCK
 .\" FIXME . As at Linux 3.2, there are some strange uses of this capability
 .\" in other places; they probably should be replaced with something else.
+.PD 0
+.RS
+.IP * 2
 Lock memory
 .RB ( mlock (2),
 .BR mlockall (2),
 .BR mmap (2),
+.BR shmctl (2));
+.IP *
+Allocate memory using huge pages
+.RB ( memfd_create (2)
+.BR mmap (2),
 .BR shmctl (2)).
+.PD 0
+.RE
 .TP
 .B CAP_IPC_OWNER
 Bypass permission checks for operations on System V IPC objects.
$ 



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/



[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux