On Tue, Jun 28, 2022 at 01:38:07PM +0000, David Laight wrote: > From: Christian Brauner > > Sent: 28 June 2022 14:13 > > > > On Sun, Jun 19, 2022 at 11:42:28AM +0100, Ralph Corderoy wrote: > > > Hi Matthew, thanks for replying. > > > > > > > > The need for O_CLOFORK might be made more clear by looking at a > > > > > long-standing Go issue, i.e. unrelated to system(3), which was started > > > > > in 2017 by Russ Cox when he summed up the current race-condition > > > > > behaviour of trying to execve(2) a newly created file: > > > > > https://github.com/golang/go/issues/22315. > > > > > > > > The problem is that people advocating for O_CLOFORK understand its > > > > value, but not its cost. Other google employees have a system which > > > > has literally millions of file descriptors in a single process. > > > > Having to maintain this extra state per-fd is a cost they don't want > > > > to pay (and have been quite vocal about earlier in this thread). > > > > > > So do you agree the userspace issue is best solved by *_CLOFORK and the > > > problem is how to implement *_CLOFORK at an acceptable cost? > > > > > > OTOH David Laight was making suggestions on moving the load to the > > > fork/exec path earlier in the thread, but OTOH Al Viro mentioned a > > > ‘portable solution’, though that could have been to a specific issue > > > rather than the more general case. > > > > > > How would you recommend approaching an acceptable cost is progressed? > > > Iterate on patch versions? Open a bugzilla.kernel.org for central > > > tracking and linking from the other projects? ..? > > > > Quoting from that go thread > > > > "If the OS had a "close all fds above x", we could use that. (I don't know of any that do, but it sure > > would help.)" > > > > So why can't this be solved with: > > close_range(fd_first, fd_last, CLOSE_RANGE_CLOEXEC | CLOSE_RANGE_UNSHARE)? > > e.g. > > close_range(100, ~0U, CLOSE_RANGE_CLOEXEC | CLOSE_RANGE_UNSHARE)? > > That is a relatively recent linux system call. > Although it can be (mostly) emulated by reading /proc/fd > - but that may not be mounted. > > In any case another thread can open an fd between the close_range() > and fork() calls. The CLOSE_RANGE_UNSHARE gives the calling thread a private file descriptor table before marking fs close-on-exec. close_range(100, ~0U, CLOSE_RANGE_CLOEXEC | CLOSE_RANGE_UNSHARE)?