On Wed, Dec 11, 2013 at 11:36:35PM +0100, Mateusz Guzik wrote: > >From my reading this will break at least the following: > fd = open(..., .. | O_CLOEXEC); > dup2(whatever, fd); > > now fd has O_CLOEXEC even though it should not Moreover, consider fork() done by a thread that shares descriptor table with somebody else. Suppose it happens in the middle of open() with O_CLOEXEC being done by another thread. We copy descriptor table after descriptor had been reserved (and marked close-on-exec), but before a reference to struct file has actually been inserted there. This code for (i = open_files; i != 0; i--) { struct file *f = *old_fds++; if (f) { get_file(f); } else { /* * The fd may be claimed in the fd bitmap but not yet * instantiated in the files array if a sibling thread * is partway through open(). So make sure that this * fd is available to the new process. */ __clear_open_fd(open_files - i, new_fdt); } rcu_assign_pointer(*new_fds++, f); } spin_unlock(&oldf->file_lock); in dup_fd() will clear the corresponding bit in open_fds, leaving close_on_exec alone. Currently that's fine (we will override whatever had been in close_on_exec when we reserve that descriptor again), but AFAICS with this patch it will break. Sure, it can be fixed up (ditto with dup2(), etc.), but what's the point? Result will require more subtle reasoning to prove correctness and will be more prone to breakage. Does that really yield visible performance improvements that would be worth the extra complexity? After all, you trade some writes to close_on_exec on descriptor reservation for unconditional write on descriptor freeing; if anything, I would expect that you'll get minor _loss_ from that change, assuming they'll be measurable in the first place... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html