On Thu, Dec 10, 2020 at 03:36:42PM +0100, Alejandro Colomar (man-pages) wrote: > Hi Christian, Hi Alex, > > Thanks for confirming that behavior. Seems reasonable. > > I was wondering... > If this call is equivalent to unshare(2)+{close(2) in a loop}, > shouldn't it fail for the same reasons those syscalls can fail? > > What about the following errors?: > > From unshare(2): > > EPERM The calling process did not have the required privi‐ > leges for this operation. unshare(CLONE_FILES) doesn't require any privileges. Only flags relevant to kernel/nsproxy.c:unshare_nsproxy_namespaces() require privileges, i.e. CLONE_NEWNS CLONE_NEWUTS CLONE_NEWIPC CLONE_NEWNET CLONE_NEWPID CLONE_NEWCGROUP CLONE_NEWTIME so the permissions are the same. > > From close(2): > EBADF fd isn't a valid open file descriptor. > > OK, this one can't happen with the current code. > Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'. > It's a no-op (although it will still unshare if the flag is set). > But souldn't it fail with EBADF? CLOSE_RANGE_UNSHARE should always give you a private file descriptor table independent of whether or not any file descriptors need to be closed. That's also how we documented the flag: /* Unshare the file descriptor table before closing file descriptors. */ #define CLOSE_RANGE_UNSHARE (1U << 1) A caller calling unshare(CLONE_FILES) and then an emulated close_range() or the proper close_range() syscall wants to make sure that all unwanted file descriptors are closed (if any) and that no new file descriptors can be injected afterwards. If you skip the unshare(CLONE_FILES) because there are no fds to be closed you open up a race window. It would also be annoying for userspace if they _may_ have received a private file descriptor table but only if any fds needed to be closed. If people really were extremely keen about skipping the unshare when no fd needs to be closed then this could become a new flag. But I really don't think that's necessary and also doesn't make a lot of sense, imho. > > EINTR The close() call was interrupted by a signal; see sig‐ > nal(7). > > EIO An I/O error occurred. > > ENOSPC, EDQUOT > On NFS, these errors are not normally reported against > the first write which exceeds the available storage > space, but instead against a subsequent write(2), > fsync(2), or close(). None of these will be seen by userspace because close_range() currently ignores all errors after it has begun closing files. Christian