On Fri, May 15, 2020 at 8:23 AM Nate Karstens <nate.karstens@xxxxxxxxxx> wrote: > > > Series of 4 patches to implement close-on-fork. Tests have been > published to https://github.com/nkarstens/ltp/tree/close-on-fork > and cover close-on-fork functionality in the following syscalls: > > * accept(4) > * dup3(2) > * fcntl(2) > * open(2) > * socket(2) > * socketpair(2) > * unshare(2) > > Addresses underlying issue in that there is no way to prevent > a fork() from duplicating a file descriptor. The existing > close-on-exec flag partially-addresses this by allowing the > parent process to mark a file descriptor as exclusive to itself, > but there is still a period of time the failure can occur > because the auto-close only occurs during the exec(). > > One manifestation of this is a race conditions in system(), which > (depending on the implementation) is non-atomic in that it first > calls a fork() and then an exec(). > > This functionality was approved by the Austin Common Standards > Revision Group for inclusion in the next revision of the POSIX > standard (see issue 1318 in the Austin Group Defect Tracker). > > --- > > This is v2 of the change. See https://lkml.org/lkml/2020/4/20/113 > for the original work. > > Thanks to everyone who provided comments on the first series of > patches. Here are replies to specific comments: > > > I suggest we group the two bits of a file (close_on_exec, close_on_fork) > > together, so that we do not have to dirty two separate cache lines. > > I could be mistaken, but I don't think this would improve efficiency. > The close-on-fork and close-on-exec flags are read at different > times. If you assume separate syscalls for fork and exec then > there are several switches between when the two flags are read. > In addition, the close-on-fork flags in the new process must be > cleared, which will be much harder if the flags are interleaved. :/ Fast path in big and performance sensitive applications is not fork() and/or exec(). This is open()/close() and others (socket(), accept(), ...) We do not want them to access extra cache lines for this new feature. Sorry, I will say no to these patches in their current form.