[PATCH] prctl.2: Clarify and update PR_SET_MM rules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Most of the rules for calling prctl(PR_SET_MM) have changed across the
years, and this man page contains many descriptions that are long obsolete.
In particular, it is missing any mention that prctl(PR_SET_MM_MAP) can be
called from an unprivileged process. Clarify the rules for prctl(PR_SET_MM)
and the error codes it returns, and update them to match newer kernel
behavior.

Signed-off-by: Matthew House <mattlloydhouse@xxxxxxxxx>
---
I pulled all of the details from kernel/sys.c and kernel/fork.c. There are
several more minutiae that I wasn't sure whether or not to include, given
that I don't know what this project's general policy is on including
historical details:

PR_SET_MM_MAP_SIZE doesn't use arg4, but it also doesn't enforce that it is
equal to zero. Should this be noted here, fixed in the kernel, or neither?

Since Linux 5.15 (commit fe69d560b5bd, "kernel/fork: always deny write
access to current MM exe_file"), /proc/pid/exe is used as the "file being
executed" for the purpose of ETXTBSY; PR_SET_MM_EXE_FILE (and its
PR_SET_MM_MAP equivalent) will deny write access on the new executable and
re-allow it on the old executable, if nothing else is executing it. Before
then, the "file being executed" was fixed to the original file passed to
execve(2). Perhaps this would better belong in proc.5, if it belongs in the
man pages at all?

Before Linux 5.15 (commit e1fbbd073137, "prctl: allow to setup brk for
et_dyn executables"), PR_SET_MM enforced that start_brk and brk were
strictly greater than end_data.

Before Linux 5.9 (commit 227175b2c914, "prctl: exe link permission error
changed from -EINVAL to -EPERM"), PR_SET_MM_EXE_FILE (and its equivalent)
returned -EINVAL rather than -EPERM if the caller didn't have local
CAP_SYS_ADMIN.

Before Linux 5.2 (commit a9e73998f9d7, "kernel/sys.c: prctl: fix false
positive in validate_prctl_map()"), PR_SET_MM enforced that start_data was
strictly less than end_data, rather than less than or equal.

Before Linux 4.14 (commit 4d28df6152aa, "prctl: Allow local CAP_SYS_ADMIN
changing exe_file"), for PR_SET_MM_EXE_FILE (and its equivalent), the
caller had to have the local root uid/gid, rather than just having local
CAP_SYS_ADMIN.

Before Linux 4.2 (commit 4a00e9df293d, "prctl: more prctl(PR_SET_MM_*)
checks"), the individual PR_SET_MM_* options did not enforce that start
addresses were below the corresponding end addresses.

Before Linux 3.10 (commit 52b3694157e3, "kernel/sys.c: make
prctl(PR_SET_MM) generally available"), all PR_SET_MM options required
CONFIG_CHECKPOINT_RESTORE to be enabled.

Before Linux 3.5 (commits bafb282df29c, "c/r: prctl: update
prctl_set_mm_exe_file() after mm->num_exe_file_vmas removal", and
4229fb1dc684, "c/r: prctl: less paranoid prctl_set_mm_exe_file()"),
PR_SET_MM_EXE_FILE enforced that no executable files were mapped, rather
than just the old executable file.

Before Linux 3.5 (commits fe8c7f5cbf91, "c/r: prctl: extend PR_SET_MM to
set up more mm_struct entries", and 736f24d5e59d, "c/r: prctl: drop VMA
flags test on PR_SET_MM_ stack data assignment"), PR_SET_MM enforced all
the memory area flags described in the current version of this man page.

Before Linux 3.5 (commit 1ad75b9e1628, "c/r: prctl: add minimal address
test to PR_SET_MM"), PR_SET_MM did not enforce that addresses were greater
than or equal to mmap_min_addr.

Thank you,
Matthew House

 man2/prctl.2 | 235 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 168 insertions(+), 67 deletions(-)

diff --git a/man2/prctl.2 b/man2/prctl.2
index 09e9072fa..312b24087 100644
--- a/man2/prctl.2
+++ b/man2/prctl.2
@@ -604,7 +604,13 @@ for more information) and a regular application should not use this feature.
 However, there are cases, such as self-modifying programs,
 where a program might find it useful to change its own memory map.
 .IP
-The calling process must have the
+If
+.I arg2
+is neither
+.B PR_SET_MM_MAP
+nor
+.BR PR_SET_MM_MAP_SIZE ,
+then the calling process must have the
 .B CAP_SYS_RESOURCE
 capability.
 The value in
@@ -627,41 +633,31 @@ option enabled.
 .TP
 .B PR_SET_MM_START_CODE
 Set the address above which the program text can run.
-The corresponding memory area must be readable and executable,
-but not writable or shareable (see
-.BR mprotect (2)
-and
-.BR mmap (2)
-for more information).
 .TP
 .B PR_SET_MM_END_CODE
 Set the address below which the program text can run.
-The corresponding memory area must be readable and executable,
-but not writable or shareable.
+This address must be greater than
+the starting address of the current program text.
 .TP
 .B PR_SET_MM_START_DATA
 Set the address above which initialized and
 uninitialized (bss) data are placed.
-The corresponding memory area must be readable and writable,
-but not executable or shareable.
 .TP
 .B PR_SET_MM_END_DATA
 Set the address below which initialized and
 uninitialized (bss) data are placed.
-The corresponding memory area must be readable and writable,
-but not executable or shareable.
+This address must be greater than or equal to
+the starting address of the current data segment.
 .TP
 .B PR_SET_MM_START_STACK
-Set the start address of the stack.
-The corresponding memory area must be readable and writable.
+Set the starting address of the stack.
+The corresponding memory area must already exist.
 .TP
 .B PR_SET_MM_START_BRK
 Set the address above which the program heap can be expanded with
 .BR brk (2)
 call.
-The address must be greater than the ending address of
-the current program data segment.
-In addition, the combined size of the resulting heap and
+The combined size of the resulting heap and
 the size of the data segment can't exceed the
 .B RLIMIT_DATA
 resource limit (see
@@ -674,6 +670,9 @@ value.
 The requirements for the address are the same as for the
 .B PR_SET_MM_START_BRK
 option.
+Also, this address must be greater than or equal to
+the current starting address for
+.BR brk (2).
 .PP
 The following options are available since Linux 3.5.
 .\" commit fe8c7f5cbf91124987106faa3bdf0c8b955c4cf7
@@ -683,12 +682,16 @@ Set the address above which the program command line is placed.
 .TP
 .B PR_SET_MM_ARG_END
 Set the address below which the program command line is placed.
+This address must be greater than or equal to
+the starting address for the current program command line.
 .TP
 .B PR_SET_MM_ENV_START
 Set the address above which the program environment is placed.
 .TP
 .B PR_SET_MM_ENV_END
 Set the address below which the program environment is placed.
+This address must be greater than or equal to
+the starting address for the current program environment.
 .IP
 The address passed with
 .BR PR_SET_MM_ARG_START ,
@@ -697,11 +700,7 @@ The address passed with
 and
 .B PR_SET_MM_ENV_END
 should belong to a process stack area.
-Thus, the corresponding memory area must be readable, writable, and
-(depending on the kernel configuration) have the
-.B MAP_GROWSDOWN
-attribute set (see
-.BR mmap (2)).
+Thus, the corresponding memory area must already exist.
 .TP
 .B PR_SET_MM_AUXV
 Set a new auxiliary vector.
@@ -710,7 +709,7 @@ The
 argument should provide the address of the vector.
 The
 .I arg4
-is the size of the vector.
+argument is the size of the vector in bytes.
 .TP
 .B PR_SET_MM_EXE_FILE
 .\" commit b32dfe377102ce668775f8b6b1461f7ad428f8b6
@@ -724,12 +723,10 @@ The file descriptor should be obtained with a regular
 .BR open (2)
 call.
 .IP
-To change the symbolic link, one needs to unmap all existing
-executable memory areas, including those created by the kernel itself
-(for example the kernel usually creates at least one executable
-memory area for the ELF
-.I .text
-section).
+.\" commit 4229fb1dc6843c49a14bb098719f8a696cdc44f8
+For the symbolic link to be changed,
+the old executable file must be fully unmapped
+from the address space of the calling process.
 .IP
 In Linux 4.9 and earlier, the
 .\" commit 3fb4afd9a504c2386b8435028d43283216bf588e
@@ -753,18 +750,36 @@ The
 .I arg4
 argument should provide the size of the struct.
 .IP
+If
+.I auxv_size
+is 0, then
+.I auxv
+is ignored,
+and the auxiliary vector is left unchanged.
+.IP
+If
+.I exe_file
+is \-1, then the
+.IR /proc/ pid /exe
+symbolic link is left unchanged.
+Otherwise, the calling process must have the
+.B CAP_SYS_ADMIN
+or (since Linux 5.9)
+.\" commit ebd6de6812387a2db9a52842cfbe004da1dd3be8
+.B CAP_CHECKPOINT_RESTORE
+capability in its user namespace.
+.IP
 This feature is available only if the kernel is built with the
 .B CONFIG_CHECKPOINT_RESTORE
 option enabled.
 .TP
 .B PR_SET_MM_MAP_SIZE
-Returns the size of the
+Return the size of the
 .I struct prctl_mm_map
-the kernel expects.
+the kernel expects,
+in the location pointed to by
+.IR "(unsigned int\~*) arg3" .
 This allows user space to find a compatible struct.
-The
-.I arg4
-argument should be a pointer to an unsigned int.
 .IP
 This feature is available only if the kernel is built with the
 .B CONFIG_CHECKPOINT_RESTORE
@@ -2076,11 +2091,23 @@ above).
 .I option
 is
 .BR PR_SET_MM ,
-and
-.I arg3
+.I arg2
+is
+.B PR_SET_MM_EXE_FILE
+or
+.BR PR_SET_MM_MAP ,
+and the file is not executable.
+.TP
+.B EACCES
+.I option
+is
+.BR PR_SET_MM ,
+.I arg2
 is
-.BR PR_SET_MM_EXE_FILE ,
-the file is not executable.
+.B PR_SET_MM_EXE_FILE
+or
+.BR PR_SET_MM_MAP ,
+and the file was open for writing by one or more processes.
 .TP
 .B EBADF
 .I option
@@ -2088,21 +2115,26 @@ is
 .BR PR_SET_MM ,
 .I arg3
 is
-.BR PR_SET_MM_EXE_FILE ,
+.B PR_SET_MM_EXE_FILE
+or
+.BR PR_SET_MM_MAP ,
 and the file descriptor passed in
-.I arg4
+.I arg3
+or
+.I exe_fd
 is not valid.
 .TP
 .B EBUSY
 .I option
 is
 .BR PR_SET_MM ,
-.I arg3
+.I arg2
 is
-.BR PR_SET_MM_EXE_FILE ,
-and this the second attempt to change the
-.IR /proc/ pid /exe
-symbolic link, which is prohibited.
+.B PR_SET_MM_EXE_FILE
+or
+.BR PR_SET_MM_MAP ,
+and the old executable file is mapped
+into the address space of the calling process.
 .TP
 .B EFAULT
 .I arg2
@@ -2124,6 +2156,36 @@ is an invalid address.
 .B EFAULT
 .I option
 is
+.BR PR_SET_MM ,
+.I arg2
+is
+.BR PR_SET_MM_START_STACK ,
+.BR PR_SET_MM_ARG_START ,
+.BR PR_SET_MM_ARG_END ,
+.BR PR_SET_MM_ENV_START ,
+.BR PR_SET_MM_ENV_END ,
+.BR PR_SET_MM_AUXV ,
+.BR PR_SET_MM_MAP ,
+or
+.BR PR_SET_MM_MAP_SIZE ,
+and
+.I arg3
+is an invalid address.
+.TP
+.B EFAULT
+.I option
+is
+.BR PR_SET_MM ,
+.I arg2
+is
+.BR PR_SET_MM_MAP ,
+and
+.I auxv
+is an invalid address.
+.TP
+.B EFAULT
+.I option
+is
 .B PR_SET_SYSCALL_USER_DISPATCH
 and
 .I arg5
@@ -2138,9 +2200,8 @@ or not supported on this system.
 .B EINVAL
 .I option
 is
-.B PR_MCE_KILL
-or
-.B PR_MCE_KILL_GET
+.BR PR_MCE_KILL ,
+.BR PR_MCE_KILL_GET ,
 or
 .BR PR_SET_MM ,
 and unused
@@ -2175,40 +2236,60 @@ and the kernel was not configured with
 .I option
 is
 .BR PR_SET_MM ,
-and one of the following is true
+and one of the following is true:
 .RS
 .IP \[bu] 3
-.I arg4
-or
-.I arg5
-is nonzero;
-.IP \[bu]
 .I arg3
-is greater than
+specifies an value less than
+.I /proc/sys/vm/mmap_min_addr
+or greater than
 .B TASK_SIZE
 (the limit on the size of the user address space for this architecture);
 .IP \[bu]
+.I arg3
+specifies a starting address greater than the corresponding ending address,
+or an ending address less than the corresponding starting address;
+.IP \[bu]
 .I arg2
 is
-.BR PR_SET_MM_START_CODE ,
-.BR PR_SET_MM_END_CODE ,
 .BR PR_SET_MM_START_DATA ,
 .BR PR_SET_MM_END_DATA ,
+.BR PR_SET_MM_START_BRK ,
+.BR PR_SET_MM_BRK ,
 or
-.BR PR_SET_MM_START_STACK ,
-and the permissions of the corresponding memory area are not as required;
+.BR PR_SET_MM_MAP ,
+and
+.I arg3
+specifies a value that would cause the
+.B RLIMIT_DATA
+resource limit to be exceeded;
 .IP \[bu]
 .I arg2
 is
-.B PR_SET_MM_START_BRK
+.B PR_SET_MM_AUXV
 or
-.BR PR_SET_MM_BRK ,
+.BR PR_SET_MM_MAP ,
 and
-.I arg3
-is less than or equal to the end of the data segment
-or specifies a value that would cause the
-.B RLIMIT_DATA
-resource limit to be exceeded.
+.I arg4
+or
+.I auxv_size
+is larger than the space internally reserved for the auxiliary vector;
+.IP \[bu]
+.I arg2
+is
+.BR PR_SET_MM_MAP ,
+and
+.I arg4
+specifies an incorrect size for
+.IR "struct prctl_mm_map" ;
+.IP \[bu]
+.I arg2
+is
+.BR PR_SET_MM_MAP ,
+.I auxv
+is a null pointer, and
+.I auxv_size
+is not 0.
 .RE
 .TP
 .B EINVAL
@@ -2471,6 +2552,11 @@ capability.
 .I option
 is
 .BR PR_SET_MM ,
+.I arg2
+is neither
+.B PR_SET_MM_MAP
+nor
+.BR PR_SET_MM_MAP_SIZE ,
 and the caller does not have the
 .B CAP_SYS_RESOURCE
 capability.
@@ -2478,6 +2564,21 @@ capability.
 .B EPERM
 .I option
 is
+.BR PR_SET_MM ,
+.I arg2
+is
+.BR PR_SET_MM_MAP ,
+.I exe_fd
+is not \-1,
+and the caller does not have the
+.B CAP_SYS_ADMIN
+or
+.B CAP_CHECKPOINT_RESTORE
+capability in its user namespace.
+.TP
+.B EPERM
+.I option
+is
 .B PR_CAP_AMBIENT
 and
 .I arg2
-- 
2.41.0




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux