On Wed, 2024-11-20 at 12:20 +0100, Mateusz Guzik wrote: > quote: > When utilized it dodges strlen() in vfs_readlink(), giving about 1.5% > speed up when issuing readlink on /initrd.img on ext4. > > The size is stored in a union with i_devices, which is never looked at > unless the inode is for a device. > > Benchmark code at the bottom. > > ext4 and tmpfs are patched, other filesystems can also get there with > some more work. > > Arguably the current get_link API should be patched to let the fs return > the size, but that's not a churn I'm interested into diving in. > > On my v1 Jan remarked 1.5% is not a particularly high win questioning > whether doing this makes sense. I noted the value is only this small > because of other slowdowns. > > To elaborate here are highlights while benching on Sapphire Rapids: > 1. putname using atomics (over 3.5% on my profile) > > sounds like Al has plans to do something here(?), I'm not touching it if > it can be helped. the atomic definitely does not have to be there in the > common case. > > 2. kmem_cache_alloc_noprof/kmem_cache_free (over 7% combined) > > They are both dog slow due to cmpxchg16b. A patchset was posted which > adds a different allocation/free fast path which should whack majority > of the problem, see: https://lore.kernel.org/linux-mm/20241112-slub-percpu-caches-v1-0-ddc0bdc27e05@xxxxxxx/ > > If this lands I'll definitely try to make the pathname allocs use it, > should drop about 5-6 percentage points on this sucker. > > 3. __legitimize_mnt issues a full fence (again over 3%) > > As far as avoiding the fence is concerned waiting on rcu grace period on > unmount should do the trick. However, I found there is a bunch > complexity there to sort out before doing this will be feasible (notably > there are multiple mounts freed in one go, this needs to be batched). > There may be other issues which I missed and which make this not worth > it, but the fence is definitely avoidable in principle and I would be > surprised if there was no way to sensibly get there. No ETA, for now I'm > just pointing this out. > > There is also the idea to speculatively elide lockref, but when I tried > that last time I ran into significant LSM-related trouble. > > All that aside there is also quite a bit of branching and func calling > which does not need to be there (example: make vfsuid/vfsgid, could be > combined into one routine etc.). > > Ultimately there is single-threaded perf left on the table in various > spots. > > v3: > - use a union instead of a dedicated field, used up with i_devices > > v2: > - add a dedicated field, flag and a helper instead of using i_size > https://lore.kernel.org/linux-fsdevel/20241119094555.660666-1-mjguzik@xxxxxxxxx/ > > v1 can be found here: > https://lore.kernel.org/linux-fsdevel/20241118085357.494178-1-mjguzik@xxxxxxxxx/ > > benchmark: > plug into will-it-scale into tests/readlink1.c: > > #include <stdlib.h> > #include <unistd.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > #include <assert.h> > #include <string.h> > > char *testcase_description = "readlink /initrd.img"; > > void testcase(unsigned long long *iterations, unsigned long nr) > { > char *tmplink = "/initrd.img"; > char buf[1024]; > > while (1) { > int error = readlink(tmplink, buf, sizeof(buf)); > assert(error > 0); > > (*iterations)++; > } > } > > Mateusz Guzik (3): > vfs: support caching symlink lengths in inodes > ext4: use inode_set_cached_link() > tmpfs: use inode_set_cached_link() > > fs/ext4/inode.c | 3 ++- > fs/ext4/namei.c | 4 +++- > fs/namei.c | 34 +++++++++++++++++++--------------- > fs/proc/namespaces.c | 2 +- > include/linux/fs.h | 15 +++++++++++++-- > mm/shmem.c | 6 ++++-- > security/apparmor/apparmorfs.c | 2 +- > 7 files changed, 43 insertions(+), 23 deletions(-) > Nice work, Mateusz! Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>