On 12/20/20 11:00 PM, Stephen Kitt wrote: > Hi Alex, > > On Sat, 19 Dec 2020 15:00:00 +0100, "Alejandro Colomar (man-pages)" > <alx.manpages@xxxxxxxxx> wrote: >> Please see some comments below. >> It's looking good ;) > > Thanks for your review and patience! > >> On 12/18/20 5:58 PM, Stephen Kitt wrote: >>> This documents close_range(2) based on information in >>> 278a5fbaed89dacd04e9d052f4594ffd0e0585de, >>> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8, and >>> 582f1fb6b721facf04848d2ca57f34468da1813e. >>> >>> Signed-off-by: Stephen Kitt <steve@xxxxxxx> >>> --- >>> V3: fix synopsis overflow >>> copy notes from membarrier.2 re the lack of wrapper >>> semantic newlines >>> drop non-standard "USE CASES" section heading >>> add code example >>> >>> V2: unsigned int to match the kernel declarations >>> groff and grammar tweaks >>> CLOSE_RANGE_UNSHARE unshares *and* closes >>> Explain that EMFILE and ENOMEM can occur with C_R_U >>> "Conforming to" phrasing >>> Detailed explanation of CLOSE_RANGE_UNSHARE >>> Reading /proc isn't common >>> >>> man2/close_range.2 | 266 +++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 266 insertions(+) >>> create mode 100644 man2/close_range.2 >>> >>> diff --git a/man2/close_range.2 b/man2/close_range.2 >>> new file mode 100644 >>> index 000000000..f8f2053ac >>> --- /dev/null >>> +++ b/man2/close_range.2 >>> @@ -0,0 +1,266 @@ >>> +.\" Copyright (c) 2020 Stephen Kitt <steve@xxxxxxx> >>> +.\" >>> +.\" %%%LICENSE_START(VERBATIM) >>> +.\" Permission is granted to make and distribute verbatim copies of this >>> +.\" manual provided the copyright notice and this permission notice are >>> +.\" preserved on all copies. >>> +.\" >>> +.\" Permission is granted to copy and distribute modified versions of >>> this +.\" manual under the conditions for verbatim copying, provided that >>> the +.\" entire resulting derived work is distributed under the terms of a >>> +.\" permission notice identical to this one. >>> +.\" >>> +.\" Since the Linux kernel and libraries are constantly changing, this >>> +.\" manual page may be incorrect or out-of-date. The author(s) assume no >>> +.\" responsibility for errors or omissions, or for damages resulting from >>> +.\" the use of the information contained herein. The author(s) may not >>> +.\" have taken the same level of care in the production of this manual, >>> +.\" which is licensed free of charge, as they might when working >>> +.\" professionally. >>> +.\" >>> +.\" Formatted or processed versions of this manual, if unaccompanied by >>> +.\" the source, must acknowledge the copyright and authors of this work. >>> +.\" %%%LICENSE_END >>> +.\" >>> +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual" >>> +.SH NAME >>> +close_range \- close all file descriptors in a given range >>> +.SH SYNOPSIS >>> +.nf >>> +.B #include <linux/close_range.h> >>> +.PP >>> +.BI "int close_range(unsigned int " first ", unsigned int " last , >>> +.BI " unsigned int " flags ); >>> +.fi >>> +.PP >>> +.IR Note : >>> +There is no glibc wrapper for this system call; see NOTES. >>> +.SH DESCRIPTION >>> +The >>> +.BR close_range () >>> +system call closes all open file descriptors from >>> +.I first >>> +to >>> +.I last >>> +(included). >>> +.PP >>> +Errors closing a given file descriptor are currently ignored. >>> +.PP >>> +.I flags >>> +can be 0 or set to one or both of the following: >>> +.TP >>> +.B CLOSE_RANGE_UNSHARE >>> +unshares the range of file descriptors from any other processes, >>> +before closing them, >>> +avoiding races with other threads sharing the file descriptor table. >>> +.TP >>> +.BR CLOSE_RANGE_CLOEXEC " (since Linux 5.10)" >> >> |sort >> >> I prefer alphabetic order rather than adding new items at the bottom. >> When lists grow, it becomes difficult to find what you're looking for. >> >> CLOEXEC should go before UNSHARE. > > That makes sense. > >>> +sets the close-on-exec bit instead of immediately closing the file >>> +descriptors. >> >> [ >> sets the close-on-exec bit instead of >> immediately closing the file descriptors. >> ] > > Is this for semantic reasons, or to balance the lines and make them easier to > read in the roff source? B is also true, but mostly A. Cheers, Alex > >>> +.SH RETURN VALUE >>> +On success, >>> +.BR close_range () >>> +returns 0. >>> +On error, \-1 is returned and >>> +.I errno >>> +is set to indicate the cause of the error. >>> +.SH ERRORS >>> +.TP >>> +.B EINVAL >>> +.I flags >>> +is not valid, or >>> +.I first >>> +is greater than >>> +.IR last . >>> +.PP >>> +The following can occur with >>> +.B CLOSE_RANGE_UNSHARE >>> +(when constructing the new descriptor table): >>> +.TP >>> +.B EMFILE >>> +The per-process limit on the number of open file descriptors has been >>> reached +(see the description of >>> +.B RLIMIT_NOFILE >>> +in >>> +.BR getrlimit (2)). >>> +.TP >>> +.B ENOMEM >>> +Insufficient kernel memory was available. >>> +.SH VERSIONS >>> +.BR close_range () >>> +first appeared in Linux 5.9. >>> +.SH CONFORMING TO >>> +.BR close_range () >>> +is a nonstandard function that is also present on FreeBSD. >>> +.SH NOTES >>> +Glibc does not provide a wrapper for this system call; call it using >>> +.BR syscall (2). >>> +.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de >>> +.SS Closing all open file descriptors >> >> The comment with the commit would be better inside the section it refers >> to, so: >> >> [ >> .SS Closing all open file descriptors >> .\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de >> ] > > Indeed! > >>> +To avoid blindly closing file descriptors in the range of possible >>> +file descriptors, >> >> [ >> To avoid blindly closing file descriptors >> in the range of possible file descriptors, >> ] >> >>> +this is sometimes implemented (on Linux) by listing open file >>> +descriptors in >> >> [ >> this is sometimes implemented (on Linux) >> by listing open file descriptors in >> ] >> >>> +.I /proc/self/fd/ >>> +and calling >>> +.BR close (2) >>> +on each one. >>> +.BR close_range () >>> +can take care of this without requiring >>> +.I /proc >>> +and with a single system call, >> >> s/with/within/ >> >>> +which provides significant performance benefits. >>> +.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8 >>> +.SS Closing file descriptors before exec >> >> [ >> .SS Closing file descriptors before exec >> .\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8 >> ] >> >>> +File descriptors can be closed safely using >>> +.PP >>> +.in +4n >>> +.EX >>> +/* we don't want anything past stderr here */ >>> +close_range(3, ~0U, CLOSE_RANGE_UNSHARE); >>> +execve(....);> +.EE >>> +.in >>> +.PP >>> +.B CLOSE_RANGE_UNSHARE >>> +is conceptually equivalent to >>> +.PP >>> +.in +4n >>> +.EX >>> +unshare(CLONE_FILES); >>> +close_range(first, last, 0); >>> +.EE >>> +.in >>> +.PP >>> +but can be more efficient: >>> +if the unshared range extends past the current maximum number of file >>> +descriptors allocated in the caller's file descriptor table >> >> [ >> if the unshared range extends past >> the current maximum number of file descriptors allocated >> in the caller's file descriptor table >> ] >> >>> +(the common case when >>> +.I last >>> +is >>> +.BR ~0U ), >> >> Literal values are not (usually) formatted. >> >> [ >> .I last >> is ~0U), >> ] >> >>> +the kernel will unshare a new file descriptor table for the caller up >>> +to >> >> [ >> the kernel will unshare a new file descriptor table for the caller up to >> ] >> >>> +.IR first . >>> +This avoids subsequent close calls entirely; >>> +the whole operation is complete once the table is unshared. >>> +.\" 582f1fb6b721facf04848d2ca57f34468da1813e >>> +.SS Closing files on \fBexec\fP >> >> [ >> .SS Closing files on \fBexec\fP >> .\" 582f1fb6b721facf04848d2ca57f34468da1813e >> ] >> >>> +This is particularly useful in cases where multiple >>> +.RB pre- exec >>> +setup steps risk conflicting with each other. >>> +For example, setting up a >>> +.BR seccomp (2) >>> +profile can conflict with a >>> +.B close_range >> >> .BR close_range () >> >>> +call: >>> +if the file descriptors are closed before the seccomp profile is set >> >> .BR seccomp (2) >> >>> +up, >> >> Please, split at a different point. >> >>> +the profile setup can't use them control their closure; >> >> I don't understand what you wanted to say. them? > > Oops, I meant "the profile setup can't use them itself, or control their > closure". > >> >>> +if the file descriptors are closed afterwards, >>> +the seccomp profile can't block the >>> +.B close_range >> >> .BR close_range () >> >>> +call or any fallbacks. >>> +Using >>> +.B CLOSE_RANGE_CLOEXEC >>> +avoids this: >>> +the descriptors can be marked before the seccomp profile is set up, >> >> .BR seccomp (2) >> >>> +and the profile can control access to >>> +.B close_range >> >> .BR close_range () >> >>> +without affecting the calling process. >>> +.SH EXAMPLES >>> +The following program is designed to be execed by the second program >>> +below. >>> +It lists its open file descriptors: >>> +.PP >>> +.in +4n >>> +.EX >>> +/* listopen.c */ >>> + >>> +#include <stdio.h> >>> +#include <sys/stat.h> >>> + >>> +int >>> +main(int argc, char *argv[]) >>> +{ >>> + int i; >> >> We use C99 declarations for loop indices. >> >>> + struct stat buf; >>> + >>> + for (i = 0; i < 100; i++) { >> >> for (int i = 0; i < 100; i++) { >> >>> + if (!fstat(i, &buf)) >>> + printf("FD %d is open.\n", i); >> >> s/\\/\\e/ >> >> see: d1a719857b7eb68f5e5c1c965089038dee683240 >> >> I sometimes forget to fix those after copying the program to the page. >> My solution is to copy the rendered text from the man page to a file >> and then compile, and those errors become obvious ;) > > Ah yes, good catch. I was looking into automating checks for the source code > included in man pages throughout the project, but that throws a spanner in > the works! > >> >>> + } >>> + >>> + exit(EXIT_SUCCESS); >>> +) >>> +.EE >>> +.in >>> +.PP >>> +This program executes the command given on its command-line after >>> +opening the files listed after the command, >>> +and then using >> >> s/using/uses/ >> >>> +.B close_range >> >> .BR close_range () >> >>> +to close them: >>> +.PP >>> +.in +4n >>> +.EX >>> +/* close_range.c */ >>> + >>> +#include <fcntl.h> >>> +#include <linux/close_range.h> >>> +#include <stdio.h> >>> +#include <stdlib.h> >>> +#include <sys/stat.h> >>> +#include <sys/syscall.h> >>> +#include <sys/types.h> >>> +#include <unistd.h> >>> + >>> +int >>> +main(int argc, char *argv[]) >>> +{ >>> + char *newargv[] = { NULL }; >>> + char *newenviron[] = { NULL }; >>> + int i; >> >> dd >> >>> + >>> + if (argc < 3) { >>> + fprintf(stderr, "Usage: %s <command-to-run> <files-to-open>\n", >>> argv[0]); >> >> s/\\/\\e/ >> >>> + exit(EXIT_FAILURE); >>> + } >>> + >>> + for (i = 2; i < argc; i++) { >> >> for (int i = 2; i < argc; i++) { >> >>> + if (open(argv[i], O_RDONLY) == -1) { >>> + perror(argv[i]); >>> + exit(EXIT_FAILURE); >>> + } >>> + } >>> + >>> + if (syscall(__NR_close_range, 3, ~0U, CLOSE_RANGE_UNSHARE) == -1) { >>> + perror("close_range"); >>> + exit(EXIT_FAILURE); >>> + } >>> + >>> + execve(argv[1], newargv, newenviron); >>> + perror("execve"); >>> + exit(EXIT_FAILURE); >>> +} >>> +.EE >>> +.in >>> +.PP >>> +We can use the second program to exec the first as follows: >>> +.PP >>> +.in +4n >>> +.EX >>> +.RB "$" " make listopen close_range" >>> +.RB "$" " ./close_range ./listopen /dev/null /dev/zero" >>> +FD 0 is open. >>> +FD 1 is open. >>> +FD 2 is open. >>> +.EE >>> +.in >>> +.PP >>> +Removing the call to >>> +.B close_range >> >> .BR close_range () >> >>> +will show different output, with the file descriptors for the named >>> +files still open. >> >> [ >> will show different output, >> with the file descriptors for the named files still open. >> ] > > Thanks, I'll send a v4 with all the fixes above. > > Regards, > > Stephen > -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/