Re: [PATCH] prctl.2: Clarify and update PR_SET_MM rules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matthew!

On 2023-07-08 23:35, Matthew House wrote:
> Most of the rules for calling prctl(PR_SET_MM) have changed across the
> years, and this man page contains many descriptions that are long obsolete.
> In particular, it is missing any mention that prctl(PR_SET_MM_MAP) can be
> called from an unprivileged process. Clarify the rules for prctl(PR_SET_MM)
> and the error codes it returns, and update them to match newer kernel
> behavior.
> 
> Signed-off-by: Matthew House <mattlloydhouse@xxxxxxxxx>
> ---

Thanks for the patch!

> I pulled all of the details from kernel/sys.c and kernel/fork.c. There are
> several more minutiae that I wasn't sure whether or not to include, given
> that I don't know what this project's general policy is on including
> historical details:

We try to include them, unless they are old enough (e.g., Linux 2.4) that
they don't matter.  You could move those obsolete details to the HISTORY
section.

> 
> PR_SET_MM_MAP_SIZE doesn't use arg4, but it also doesn't enforce that it is
> equal to zero. Should this be noted here, fixed in the kernel, or neither?
> 
> Since Linux 5.15 (commit fe69d560b5bd, "kernel/fork: always deny write
> access to current MM exe_file"), /proc/pid/exe is used as the "file being
> executed" for the purpose of ETXTBSY; PR_SET_MM_EXE_FILE (and its
> PR_SET_MM_MAP equivalent) will deny write access on the new executable and
> re-allow it on the old executable, if nothing else is executing it. Before
> then, the "file being executed" was fixed to the original file passed to
> execve(2). Perhaps this would better belong in proc.5, if it belongs in the
> man pages at all?
> 
> Before Linux 5.15 (commit e1fbbd073137, "prctl: allow to setup brk for
> et_dyn executables"), PR_SET_MM enforced that start_brk and brk were
> strictly greater than end_data.
> 
> Before Linux 5.9 (commit 227175b2c914, "prctl: exe link permission error
> changed from -EINVAL to -EPERM"), PR_SET_MM_EXE_FILE (and its equivalent)
> returned -EINVAL rather than -EPERM if the caller didn't have local
> CAP_SYS_ADMIN.
> 
> Before Linux 5.2 (commit a9e73998f9d7, "kernel/sys.c: prctl: fix false
> positive in validate_prctl_map()"), PR_SET_MM enforced that start_data was
> strictly less than end_data, rather than less than or equal.
> 
> Before Linux 4.14 (commit 4d28df6152aa, "prctl: Allow local CAP_SYS_ADMIN
> changing exe_file"), for PR_SET_MM_EXE_FILE (and its equivalent), the
> caller had to have the local root uid/gid, rather than just having local
> CAP_SYS_ADMIN.
> 
> Before Linux 4.2 (commit 4a00e9df293d, "prctl: more prctl(PR_SET_MM_*)
> checks"), the individual PR_SET_MM_* options did not enforce that start
> addresses were below the corresponding end addresses.
> 
> Before Linux 3.10 (commit 52b3694157e3, "kernel/sys.c: make
> prctl(PR_SET_MM) generally available"), all PR_SET_MM options required
> CONFIG_CHECKPOINT_RESTORE to be enabled.
> 
> Before Linux 3.5 (commits bafb282df29c, "c/r: prctl: update
> prctl_set_mm_exe_file() after mm->num_exe_file_vmas removal", and
> 4229fb1dc684, "c/r: prctl: less paranoid prctl_set_mm_exe_file()"),
> PR_SET_MM_EXE_FILE enforced that no executable files were mapped, rather
> than just the old executable file.
> 
> Before Linux 3.5 (commits fe8c7f5cbf91, "c/r: prctl: extend PR_SET_MM to
> set up more mm_struct entries", and 736f24d5e59d, "c/r: prctl: drop VMA
> flags test on PR_SET_MM_ stack data assignment"), PR_SET_MM enforced all
> the memory area flags described in the current version of this man page.
> 
> Before Linux 3.5 (commit 1ad75b9e1628, "c/r: prctl: add minimal address
> test to PR_SET_MM"), PR_SET_MM did not enforce that addresses were greater
> than or equal to mmap_min_addr.

We also try to have patches that are minimal.  Could you please send a
patch set where each of the paragraphs above forms a separate patch?

Cheers,
Alex

> 
> Thank you,
> Matthew House
> 
>  man2/prctl.2 | 235 ++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 168 insertions(+), 67 deletions(-)
> 
> diff --git a/man2/prctl.2 b/man2/prctl.2
> index 09e9072fa..312b24087 100644
> --- a/man2/prctl.2
> +++ b/man2/prctl.2
> @@ -604,7 +604,13 @@ for more information) and a regular application should not use this feature.
>  However, there are cases, such as self-modifying programs,
>  where a program might find it useful to change its own memory map.
>  .IP
> -The calling process must have the
> +If
> +.I arg2
> +is neither
> +.B PR_SET_MM_MAP
> +nor
> +.BR PR_SET_MM_MAP_SIZE ,
> +then the calling process must have the
>  .B CAP_SYS_RESOURCE
>  capability.
>  The value in
> @@ -627,41 +633,31 @@ option enabled.
>  .TP
>  .B PR_SET_MM_START_CODE
>  Set the address above which the program text can run.
> -The corresponding memory area must be readable and executable,
> -but not writable or shareable (see
> -.BR mprotect (2)
> -and
> -.BR mmap (2)
> -for more information).
>  .TP
>  .B PR_SET_MM_END_CODE
>  Set the address below which the program text can run.
> -The corresponding memory area must be readable and executable,
> -but not writable or shareable.
> +This address must be greater than
> +the starting address of the current program text.
>  .TP
>  .B PR_SET_MM_START_DATA
>  Set the address above which initialized and
>  uninitialized (bss) data are placed.
> -The corresponding memory area must be readable and writable,
> -but not executable or shareable.
>  .TP
>  .B PR_SET_MM_END_DATA
>  Set the address below which initialized and
>  uninitialized (bss) data are placed.
> -The corresponding memory area must be readable and writable,
> -but not executable or shareable.
> +This address must be greater than or equal to
> +the starting address of the current data segment.
>  .TP
>  .B PR_SET_MM_START_STACK
> -Set the start address of the stack.
> -The corresponding memory area must be readable and writable.
> +Set the starting address of the stack.
> +The corresponding memory area must already exist.
>  .TP
>  .B PR_SET_MM_START_BRK
>  Set the address above which the program heap can be expanded with
>  .BR brk (2)
>  call.
> -The address must be greater than the ending address of
> -the current program data segment.
> -In addition, the combined size of the resulting heap and
> +The combined size of the resulting heap and
>  the size of the data segment can't exceed the
>  .B RLIMIT_DATA
>  resource limit (see
> @@ -674,6 +670,9 @@ value.
>  The requirements for the address are the same as for the
>  .B PR_SET_MM_START_BRK
>  option.
> +Also, this address must be greater than or equal to
> +the current starting address for
> +.BR brk (2).
>  .PP
>  The following options are available since Linux 3.5.
>  .\" commit fe8c7f5cbf91124987106faa3bdf0c8b955c4cf7
> @@ -683,12 +682,16 @@ Set the address above which the program command line is placed.
>  .TP
>  .B PR_SET_MM_ARG_END
>  Set the address below which the program command line is placed.
> +This address must be greater than or equal to
> +the starting address for the current program command line.
>  .TP
>  .B PR_SET_MM_ENV_START
>  Set the address above which the program environment is placed.
>  .TP
>  .B PR_SET_MM_ENV_END
>  Set the address below which the program environment is placed.
> +This address must be greater than or equal to
> +the starting address for the current program environment.
>  .IP
>  The address passed with
>  .BR PR_SET_MM_ARG_START ,
> @@ -697,11 +700,7 @@ The address passed with
>  and
>  .B PR_SET_MM_ENV_END
>  should belong to a process stack area.
> -Thus, the corresponding memory area must be readable, writable, and
> -(depending on the kernel configuration) have the
> -.B MAP_GROWSDOWN
> -attribute set (see
> -.BR mmap (2)).
> +Thus, the corresponding memory area must already exist.
>  .TP
>  .B PR_SET_MM_AUXV
>  Set a new auxiliary vector.
> @@ -710,7 +709,7 @@ The
>  argument should provide the address of the vector.
>  The
>  .I arg4
> -is the size of the vector.
> +argument is the size of the vector in bytes.
>  .TP
>  .B PR_SET_MM_EXE_FILE
>  .\" commit b32dfe377102ce668775f8b6b1461f7ad428f8b6
> @@ -724,12 +723,10 @@ The file descriptor should be obtained with a regular
>  .BR open (2)
>  call.
>  .IP
> -To change the symbolic link, one needs to unmap all existing
> -executable memory areas, including those created by the kernel itself
> -(for example the kernel usually creates at least one executable
> -memory area for the ELF
> -.I .text
> -section).
> +.\" commit 4229fb1dc6843c49a14bb098719f8a696cdc44f8
> +For the symbolic link to be changed,
> +the old executable file must be fully unmapped
> +from the address space of the calling process.
>  .IP
>  In Linux 4.9 and earlier, the
>  .\" commit 3fb4afd9a504c2386b8435028d43283216bf588e
> @@ -753,18 +750,36 @@ The
>  .I arg4
>  argument should provide the size of the struct.
>  .IP
> +If
> +.I auxv_size
> +is 0, then
> +.I auxv
> +is ignored,
> +and the auxiliary vector is left unchanged.
> +.IP
> +If
> +.I exe_file
> +is \-1, then the
> +.IR /proc/ pid /exe
> +symbolic link is left unchanged.
> +Otherwise, the calling process must have the
> +.B CAP_SYS_ADMIN
> +or (since Linux 5.9)
> +.\" commit ebd6de6812387a2db9a52842cfbe004da1dd3be8
> +.B CAP_CHECKPOINT_RESTORE
> +capability in its user namespace.
> +.IP
>  This feature is available only if the kernel is built with the
>  .B CONFIG_CHECKPOINT_RESTORE
>  option enabled.
>  .TP
>  .B PR_SET_MM_MAP_SIZE
> -Returns the size of the
> +Return the size of the
>  .I struct prctl_mm_map
> -the kernel expects.
> +the kernel expects,
> +in the location pointed to by
> +.IR "(unsigned int\~*) arg3" .
>  This allows user space to find a compatible struct.
> -The
> -.I arg4
> -argument should be a pointer to an unsigned int.
>  .IP
>  This feature is available only if the kernel is built with the
>  .B CONFIG_CHECKPOINT_RESTORE
> @@ -2076,11 +2091,23 @@ above).
>  .I option
>  is
>  .BR PR_SET_MM ,
> -and
> -.I arg3
> +.I arg2
> +is
> +.B PR_SET_MM_EXE_FILE
> +or
> +.BR PR_SET_MM_MAP ,
> +and the file is not executable.
> +.TP
> +.B EACCES
> +.I option
> +is
> +.BR PR_SET_MM ,
> +.I arg2
>  is
> -.BR PR_SET_MM_EXE_FILE ,
> -the file is not executable.
> +.B PR_SET_MM_EXE_FILE
> +or
> +.BR PR_SET_MM_MAP ,
> +and the file was open for writing by one or more processes.
>  .TP
>  .B EBADF
>  .I option
> @@ -2088,21 +2115,26 @@ is
>  .BR PR_SET_MM ,
>  .I arg3
>  is
> -.BR PR_SET_MM_EXE_FILE ,
> +.B PR_SET_MM_EXE_FILE
> +or
> +.BR PR_SET_MM_MAP ,
>  and the file descriptor passed in
> -.I arg4
> +.I arg3
> +or
> +.I exe_fd
>  is not valid.
>  .TP
>  .B EBUSY
>  .I option
>  is
>  .BR PR_SET_MM ,
> -.I arg3
> +.I arg2
>  is
> -.BR PR_SET_MM_EXE_FILE ,
> -and this the second attempt to change the
> -.IR /proc/ pid /exe
> -symbolic link, which is prohibited.
> +.B PR_SET_MM_EXE_FILE
> +or
> +.BR PR_SET_MM_MAP ,
> +and the old executable file is mapped
> +into the address space of the calling process.
>  .TP
>  .B EFAULT
>  .I arg2
> @@ -2124,6 +2156,36 @@ is an invalid address.
>  .B EFAULT
>  .I option
>  is
> +.BR PR_SET_MM ,
> +.I arg2
> +is
> +.BR PR_SET_MM_START_STACK ,
> +.BR PR_SET_MM_ARG_START ,
> +.BR PR_SET_MM_ARG_END ,
> +.BR PR_SET_MM_ENV_START ,
> +.BR PR_SET_MM_ENV_END ,
> +.BR PR_SET_MM_AUXV ,
> +.BR PR_SET_MM_MAP ,
> +or
> +.BR PR_SET_MM_MAP_SIZE ,
> +and
> +.I arg3
> +is an invalid address.
> +.TP
> +.B EFAULT
> +.I option
> +is
> +.BR PR_SET_MM ,
> +.I arg2
> +is
> +.BR PR_SET_MM_MAP ,
> +and
> +.I auxv
> +is an invalid address.
> +.TP
> +.B EFAULT
> +.I option
> +is
>  .B PR_SET_SYSCALL_USER_DISPATCH
>  and
>  .I arg5
> @@ -2138,9 +2200,8 @@ or not supported on this system.
>  .B EINVAL
>  .I option
>  is
> -.B PR_MCE_KILL
> -or
> -.B PR_MCE_KILL_GET
> +.BR PR_MCE_KILL ,
> +.BR PR_MCE_KILL_GET ,
>  or
>  .BR PR_SET_MM ,
>  and unused
> @@ -2175,40 +2236,60 @@ and the kernel was not configured with
>  .I option
>  is
>  .BR PR_SET_MM ,
> -and one of the following is true
> +and one of the following is true:
>  .RS
>  .IP \[bu] 3
> -.I arg4
> -or
> -.I arg5
> -is nonzero;
> -.IP \[bu]
>  .I arg3
> -is greater than
> +specifies an value less than
> +.I /proc/sys/vm/mmap_min_addr
> +or greater than
>  .B TASK_SIZE
>  (the limit on the size of the user address space for this architecture);
>  .IP \[bu]
> +.I arg3
> +specifies a starting address greater than the corresponding ending address,
> +or an ending address less than the corresponding starting address;
> +.IP \[bu]
>  .I arg2
>  is
> -.BR PR_SET_MM_START_CODE ,
> -.BR PR_SET_MM_END_CODE ,
>  .BR PR_SET_MM_START_DATA ,
>  .BR PR_SET_MM_END_DATA ,
> +.BR PR_SET_MM_START_BRK ,
> +.BR PR_SET_MM_BRK ,
>  or
> -.BR PR_SET_MM_START_STACK ,
> -and the permissions of the corresponding memory area are not as required;
> +.BR PR_SET_MM_MAP ,
> +and
> +.I arg3
> +specifies a value that would cause the
> +.B RLIMIT_DATA
> +resource limit to be exceeded;
>  .IP \[bu]
>  .I arg2
>  is
> -.B PR_SET_MM_START_BRK
> +.B PR_SET_MM_AUXV
>  or
> -.BR PR_SET_MM_BRK ,
> +.BR PR_SET_MM_MAP ,
>  and
> -.I arg3
> -is less than or equal to the end of the data segment
> -or specifies a value that would cause the
> -.B RLIMIT_DATA
> -resource limit to be exceeded.
> +.I arg4
> +or
> +.I auxv_size
> +is larger than the space internally reserved for the auxiliary vector;
> +.IP \[bu]
> +.I arg2
> +is
> +.BR PR_SET_MM_MAP ,
> +and
> +.I arg4
> +specifies an incorrect size for
> +.IR "struct prctl_mm_map" ;
> +.IP \[bu]
> +.I arg2
> +is
> +.BR PR_SET_MM_MAP ,
> +.I auxv
> +is a null pointer, and
> +.I auxv_size
> +is not 0.
>  .RE
>  .TP
>  .B EINVAL
> @@ -2471,6 +2552,11 @@ capability.
>  .I option
>  is
>  .BR PR_SET_MM ,
> +.I arg2
> +is neither
> +.B PR_SET_MM_MAP
> +nor
> +.BR PR_SET_MM_MAP_SIZE ,
>  and the caller does not have the
>  .B CAP_SYS_RESOURCE
>  capability.
> @@ -2478,6 +2564,21 @@ capability.
>  .B EPERM
>  .I option
>  is
> +.BR PR_SET_MM ,
> +.I arg2
> +is
> +.BR PR_SET_MM_MAP ,
> +.I exe_fd
> +is not \-1,
> +and the caller does not have the
> +.B CAP_SYS_ADMIN
> +or
> +.B CAP_CHECKPOINT_RESTORE
> +capability in its user namespace.
> +.TP
> +.B EPERM
> +.I option
> +is
>  .B PR_CAP_AMBIENT
>  and
>  .I arg2

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux