Hi Christian, Makes sense to me. Thanks, Alex On 12/12/20 1:14 PM, Christian Brauner wrote: > On Thu, Dec 10, 2020 at 03:36:42PM +0100, Alejandro Colomar (man-pages) wrote: >> Hi Christian, > > Hi Alex, > >> >> Thanks for confirming that behavior. Seems reasonable. >> >> I was wondering... >> If this call is equivalent to unshare(2)+{close(2) in a loop}, >> shouldn't it fail for the same reasons those syscalls can fail? >> >> What about the following errors?: >> >> From unshare(2): >> >> EPERM The calling process did not have the required privi‐ >> leges for this operation. > > unshare(CLONE_FILES) doesn't require any privileges. Only flags relevant > to kernel/nsproxy.c:unshare_nsproxy_namespaces() require privileges, > i.e. > CLONE_NEWNS > CLONE_NEWUTS > CLONE_NEWIPC > CLONE_NEWNET > CLONE_NEWPID > CLONE_NEWCGROUP > CLONE_NEWTIME > so the permissions are the same. > >> >> From close(2): >> EBADF fd isn't a valid open file descriptor. >> >> OK, this one can't happen with the current code. >> Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'. >> It's a no-op (although it will still unshare if the flag is set). >> But souldn't it fail with EBADF? > > CLOSE_RANGE_UNSHARE should always give you a private file descriptor > table independent of whether or not any file descriptors need to be > closed. That's also how we documented the flag: > > /* Unshare the file descriptor table before closing file descriptors. */ > #define CLOSE_RANGE_UNSHARE (1U << 1) > > A caller calling unshare(CLONE_FILES) and then an emulated close_range() > or the proper close_range() syscall wants to make sure that all unwanted > file descriptors are closed (if any) and that no new file descriptors > can be injected afterwards. If you skip the unshare(CLONE_FILES) because > there are no fds to be closed you open up a race window. It would also > be annoying for userspace if they _may_ have received a private file > descriptor table but only if any fds needed to be closed. > > If people really were extremely keen about skipping the unshare when no > fd needs to be closed then this could become a new flag. But I really > don't think that's necessary and also doesn't make a lot of sense, imho. > >> >> EINTR The close() call was interrupted by a signal; see sig‐ >> nal(7). >> >> EIO An I/O error occurred. >> >> ENOSPC, EDQUOT >> On NFS, these errors are not normally reported against >> the first write which exceeds the available storage >> space, but instead against a subsequent write(2), >> fsync(2), or close(). > > None of these will be seen by userspace because close_range() currently > ignores all errors after it has begun closing files. > > Christian >