Re: [patch] close_range.2: new page documenting close_range(2)

"Alejandro Colomar (man-pages)" <alx.manpages@xxxxxxxxx> · Sat, 12 Dec 2020 18:58:29 +0100

Hi Christian,

Makes sense to me.

Thanks,

Alex

On 12/12/20 1:14 PM, Christian Brauner wrote:
> On Thu, Dec 10, 2020 at 03:36:42PM +0100, Alejandro Colomar (man-pages) wrote:
>> Hi Christian,
> 
> Hi Alex,
> 
>>
>> Thanks for confirming that behavior.  Seems reasonable.
>>
>> I was wondering...
>> If this call is equivalent to unshare(2)+{close(2) in a loop},
>> shouldn't it fail for the same reasons those syscalls can fail?
>>
>> What about the following errors?:
>>
>> From unshare(2):
>>
>>        EPERM  The calling process did not have the  required  privi‐
>>               leges for this operation.
> 
> unshare(CLONE_FILES) doesn't require any privileges. Only flags relevant
> to kernel/nsproxy.c:unshare_nsproxy_namespaces() require privileges,
> i.e.
> CLONE_NEWNS
> CLONE_NEWUTS
> CLONE_NEWIPC
> CLONE_NEWNET
> CLONE_NEWPID
> CLONE_NEWCGROUP
> CLONE_NEWTIME
> so the permissions are the same.
> 
>>
>> From close(2):
>>        EBADF  fd isn't a valid open file descriptor.
>>
>> OK, this one can't happen with the current code.
>> Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'.
>> It's a no-op (although it will still unshare if the flag is set).
>> But souldn't it fail with EBADF?
> 
> CLOSE_RANGE_UNSHARE should always give you a private file descriptor
> table independent of whether or not any file descriptors need to be
> closed. That's also how we documented the flag:
> 
> /* Unshare the file descriptor table before closing file descriptors. */
> #define CLOSE_RANGE_UNSHARE	(1U << 1)
> 
> A caller calling unshare(CLONE_FILES) and then an emulated close_range()
> or the proper close_range() syscall wants to make sure that all unwanted
> file descriptors are closed (if any) and that no new file descriptors
> can be injected afterwards. If you skip the unshare(CLONE_FILES) because
> there are no fds to be closed you open up a race window. It would also
> be annoying for userspace if they _may_ have received a private file
> descriptor table but only if any fds needed to be closed.
> 
> If people really were extremely keen about skipping the unshare when no
> fd needs to be closed then this could become a new flag. But I really
> don't think that's necessary and also doesn't make a lot of sense, imho.
> 
>>
>>        EINTR  The close() call was interrupted by a signal; see sig‐
>>               nal(7).
>>
>>        EIO    An I/O error occurred.
>>
>>        ENOSPC, EDQUOT
>>               On NFS, these errors are not normally reported against
>>               the first write which exceeds  the  available  storage
>>               space,  but  instead  against  a  subsequent write(2),
>>               fsync(2), or close().
> 
> None of these will be seen by userspace because close_range() currently
> ignores all errors after it has begun closing files.
> 
> Christian
>