On Wed, Nov 04 2015, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > On Tue, 2015-11-03 at 10:41 +0100, Rasmus Villemoes wrote: > >> @@ -667,7 +667,7 @@ void do_close_on_exec(struct files_struct *files) >> fdt = files_fdtable(files); >> if (fd >= fdt->max_fds) >> break; >> - set = fdt->close_on_exec[i]; >> + set = fdt->close_on_exec[i] & fdt->open_fds[i]; >> if (!set) >> continue; >> fdt->close_on_exec[i] = 0; > > If you don't bother, why leaving this final fdt->close_on_exec[i] = 0 ? Thanks, it should go, along with the mentioned memsets. Updated patch below. Reading dup_fd() I'm even more convinced that we're not relying on any particular value for close_on_exec bits for unused fds. After /* * The fd may be claimed in the fd bitmap but not yet * instantiated in the files array if a sibling thread * is partway through open(). So make sure that this * fd is available to the new process. */ we only __clear_open_fd(), so the close_on_exec bit may be left set in the new process. From: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx> Date: Tue, 3 Nov 2015 09:43:53 +0100 Subject: [PATCH] vfs: don't bother clearing close_on_exec bit for unused fds In fc90888d07b8 (vfs: conditionally clear close-on-exec flag) a conditional was added to __clear_close_on_exec to avoid dirtying a cache line in the common case where the bit is already clear. However, AFAICT, we don't rely on the close_on_exec bit being clear for unused fds, except as an optimization in do_close_on_exec(); if I haven't missed anything, __{set,clear}_close_on_exec is always called when a new fd is allocated. At the expense of also reading through ->open_fds in do_close_on_exec(), we can avoid accessing the close_on_exec bitmap altogether in close(), which I think is a reasonable trade-off. The conditional added in the commit above still makes sense to avoid the dirtying on the allocation paths, but I also think it might make sense in __set_close_on_exec: I suppose any given app handling a non-trivial amount of fds uses O_CLOEXEC for either almost none or almost all of them, so after a while one would reach a sort of steady-state where bits in ->close_on_exec are almost never flipped. Signed-off-by: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx> --- fs/file.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/file.c b/fs/file.c index c6986dce0334..1bb74923395c 100644 --- a/fs/file.c +++ b/fs/file.c @@ -79,7 +79,6 @@ static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt) memcpy(nfdt->open_fds, ofdt->open_fds, cpy); memset((char *)(nfdt->open_fds) + cpy, 0, set); memcpy(nfdt->close_on_exec, ofdt->close_on_exec, cpy); - memset((char *)(nfdt->close_on_exec) + cpy, 0, set); cpy = BITBIT_SIZE(ofdt->max_fds); set = BITBIT_SIZE(nfdt->max_fds) - cpy; @@ -231,7 +230,8 @@ repeat: static inline void __set_close_on_exec(int fd, struct fdtable *fdt) { - __set_bit(fd, fdt->close_on_exec); + if (!test_bit(fd, fdt->close_on_exec)) + __set_bit(fd, fdt->close_on_exec); } static inline void __clear_close_on_exec(int fd, struct fdtable *fdt) @@ -369,7 +369,6 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp) int start = open_files / BITS_PER_LONG; memset(&new_fdt->open_fds[start], 0, left); - memset(&new_fdt->close_on_exec[start], 0, left); } rcu_assign_pointer(newf->fdt, new_fdt); @@ -644,7 +643,6 @@ int __close_fd(struct files_struct *files, unsigned fd) if (!file) goto out_unlock; rcu_assign_pointer(fdt->fd[fd], NULL); - __clear_close_on_exec(fd, fdt); __put_unused_fd(files, fd); spin_unlock(&files->file_lock); return filp_close(file, files); @@ -667,10 +665,9 @@ void do_close_on_exec(struct files_struct *files) fdt = files_fdtable(files); if (fd >= fdt->max_fds) break; - set = fdt->close_on_exec[i]; + set = fdt->close_on_exec[i] & fdt->open_fds[i]; if (!set) continue; - fdt->close_on_exec[i] = 0; for ( ; set ; fd++, set >>= 1) { struct file *file; if (!(set & 1)) -- 2.6.1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html