On Wed, Apr 26, 2023 at 08:46:28PM +0100, Al Viro wrote: > On Wed, Apr 26, 2023 at 08:13:37PM +0100, Matthew Wilcox wrote: > > On Wed, Apr 26, 2023 at 05:58:06PM +0000, Kernel.org Bugbot wrote: > > > When running a threaded program, and opening a file descriptor that > > > is a power of 2 (starting at 64), the call takes a very long time to > > > complete. Normally such a call takes less than 2us. However with this > > > issue, I've seen the call take up to around 50ms. Additionally this only > > > happens the first time, and not subsequent times that file descriptor is > > > used. I'm guessing there might be some expansion of some internal data > > > structures going on. But I cannot see why this process would take so long. > > > > Because we allocate a new block of memory and then memcpy() the old > > block of memory into it. This isn't surprising behaviour to me. > > I don't think there's much we can do to change it (Allocating a > > segmented array of file descriptors has previously been vetoed by > > people who have programs with a million file descriptors). Is it > > causing you problems? > > FWIW, I suspect that this is not so much allocation + memcpy. > /* make sure all fd_install() have seen resize_in_progress > * or have finished their rcu_read_lock_sched() section. > */ > if (atomic_read(&files->count) > 1) > synchronize_rcu(); > > in expand_fdtable() is a likelier source of delays. Perhaps? The delay seemed to be roughly doubling with the test program, so I assumed it was primarily the memcpy() cost for the reporter's system: FD=64 duration=12565293 FD=128 duration=24755063 FD=256 duration=7602777 ... although now I've pasted it, I see my brain skipped one digit, so 256 was faster than 64, not about twice as slow as 128.