On Sun, May 19, 2024 at 12:31:20AM +0100, Matthew Wilcox wrote: > On Sat, May 18, 2024 at 08:20:05AM +0200, Mateusz Guzik wrote: > > Execs of dynamically linked binaries at 20-ish cores are bottlenecked on > > the i_mmap_rwsem semaphore, while the biggest singular contributor is > > free_pgd_range inducing the lock acquire back-to-back for all > > consecutive mappings of a given file. > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index b6bdaa18b9e9..443d0c55df80 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > I do object to this going into mm.h. mm/internal.h would be better. > Noted. > I haven't reviewed the patch in depth, but I don't have a problem with > the idea. I think it's only a stopgap and we really do need a better > data structure than this. > I'll send a v2 after some more reviews pour in. The above indeed is just a low hanging fruit fixup in an unpleasant situation. I think the real fix in the long run would provide the loader with means to be more efficient about it. strace /bin/echo shows: [snip] openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\243\2\0\0\0\0\0"..., 832) = 832 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 fstat(3, {st_mode=S_IFREG|0755, st_size=2125328, ...}) = 0 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 mmap(NULL, 2170256, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7dbda8a00000 mmap(0x7dbda8a28000, 1605632, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7dbda8a28000 mmap(0x7dbda8bb0000, 323584, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b0000) = 0x7dbda8bb0000 mmap(0x7dbda8bff000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1fe000) = 0x7dbda8bff000 mmap(0x7dbda8c05000, 52624, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7dbda8c05000 [/snip] Hence the 5 mappings. Should there be a mechanism to issue all these mmaps at the same time there would definitely be savings in total work done, not only in terms of one i_mmap_rwsem lock trip. The mechanism should be versatile enough to replace other back-to-back mmap uses. It would be great if on top of it it did not require the size argument, instead it could return a pair address + size. Then the typical combo of open + fstat + mmap could be shortened. As in that was just a quick note, I have no intention of pursuing anything of the sort. I'll probably submit some other patches to damage-control the state without altering any design choices.