On Sat, Oct 31, 2015 at 1:45 PM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > 13.84% opensock [kernel.kallsyms] [k] queued_spin_lock_slowpath > | > --- queued_spin_lock_slowpath > | > |--99.97%-- _raw_spin_lock > | | > | |--53.03%-- __close_fd > | | > | |--46.83%-- __alloc_fd Interesting. "__close_fd" actually looks more expensive than allocation. They presumably get called equally often, so it's probably some cache effect. __close_fd() doesn't do anything even remotely interesting as far as I can tell, but it strikes me that we probably take a *lot* of cache misses on the stupid "close-on-exec" flags, which are probably always zero anyway. Mind testing something really stupid, and making the __clear_bit() in __clear_close_on_exec() conditiona, something like this: static inline void __clear_close_on_exec(int fd, struct fdtable *fdt) { - __clear_bit(fd, fdt->close_on_exec); + if (test_bit(fd, fdt->close_on_exec) + __clear_bit(fd, fdt->close_on_exec); } and see if it makes a difference. This is the kind of thing that a single-threaded (or even single-socket) test will never actually show, because it caches well enough. But for two sockets, I could imagine the unnecessary dirtying of cachelines and ping-pong being noticeable. The other stuff we probably can't do all that much about. Unless we decide to go for some complicated lockless optimistic file descriptor allocation scheme with retry-on-failure instead of locks. Which I'm sure is possible, but I'm equally sure is painful. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html