On Tue, Aug 27, 2024 at 12:00 PM Jan Kara <jack@xxxxxxx> wrote: > > On Thu 15-08-24 10:33:10, Mateusz Guzik wrote: > > According to bpftrace on these routines most calls result in cmpxchg, > > which already provides the same guarantee. > > > > In inode_maybe_inc_iversion elision is possible because even if the > > wrong value was read due to now missing smp_mb fence, the issue is going > > to correct itself after cmpxchg. If it appears cmpxchg wont be issued, > > the fence + reload are there bringing back previous behavior. > > > > Signed-off-by: Mateusz Guzik <mjguzik@xxxxxxxxx> > > --- > > > > chances are this entire barrier guarantee is of no significance, but i'm > > not signing up to review it > > Jeff might have a ready answer here - added to CC. I think the barrier is > needed in principle so that you can guarantee that after a data change you > will be able to observe an i_version change. > That is the description, I am saying it is unclear if anyone needs it and that I am not interested in spending time on finding out. > > I verified the force flag is not *always* set (but it is set in the most > > common case). > > Well, I'm not convinced the more complicated code is really worth it. > 'force' will be set when we update timestamps which happens once per tick > (usually 1-4 ms). So that is common case on lightly / moderately loaded > system. On heavily write(2)-loaded system, 'force' should be mostly false > and unless you also heavily stat(2) the modified files, the common path is > exactly the "if (!force && !(cur & I_VERSION_QUERIED))" branch. So saving > one smp_mb() on moderately loaded system per couple of ms (per inode) > doesn't seem like a noticeable win... > inode_maybe_inc_iversion is used a lot and most commonly *with* the force flag. You can try it out yourself: bpftrace -e 'kprobe:inode_maybe_inc_iversion { @[kstack(), arg1] = count(); }' and run your favourite fs-using workload, for example this is a top backtrace form few seconds of building the kernel: @[ inode_maybe_inc_iversion+5 inode_update_timestamps+238 generic_update_time+19 file_update_time+125 shmem_file_write_iter+118 vfs_write+599 ksys_write+103 do_syscall_64+82 entry_SYSCALL_64_after_hwframe+118 , 1]: 1670 the '1' at the end indicates 'force' flag set to 1. This also shows up on unlink et al. The context here is that vfs is dog slow single-threaded in part due to spurious barriers sneaked in all over the place, here is another example: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.misc&id=30eb7cc03875b508ce6683c8b3cf6e88442a2f87 -- Mateusz Guzik <mjguzik gmail.com>