Re: [PATCH] vfs: elide smp_mb in iversion handling in the common case

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 27, 2024 at 12:00 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Thu 15-08-24 10:33:10, Mateusz Guzik wrote:
> > According to bpftrace on these routines most calls result in cmpxchg,
> > which already provides the same guarantee.
> >
> > In inode_maybe_inc_iversion elision is possible because even if the
> > wrong value was read due to now missing smp_mb fence, the issue is going
> > to correct itself after cmpxchg. If it appears cmpxchg wont be issued,
> > the fence + reload are there bringing back previous behavior.
> >
> > Signed-off-by: Mateusz Guzik <mjguzik@xxxxxxxxx>
> > ---
> >
> > chances are this entire barrier guarantee is of no significance, but i'm
> > not signing up to review it
>
> Jeff might have a ready answer here - added to CC. I think the barrier is
> needed in principle so that you can guarantee that after a data change you
> will be able to observe an i_version change.
>

That is the description, I am saying it is unclear if anyone needs it
and that I am not interested in spending time on finding out.

> > I verified the force flag is not *always* set (but it is set in the most
> > common case).
>
> Well, I'm not convinced the more complicated code is really worth it.
> 'force' will be set when we update timestamps which happens once per tick
> (usually 1-4 ms). So that is common case on lightly / moderately loaded
> system. On heavily write(2)-loaded system, 'force' should be mostly false
> and unless you also heavily stat(2) the modified files, the common path is
> exactly the "if (!force && !(cur & I_VERSION_QUERIED))" branch. So saving
> one smp_mb() on moderately loaded system per couple of ms (per inode)
> doesn't seem like a noticeable win...
>

inode_maybe_inc_iversion is used a lot and most commonly *with* the force flag.

You can try it out yourself: bpftrace -e
'kprobe:inode_maybe_inc_iversion { @[kstack(), arg1] = count(); }'

and run your favourite fs-using workload, for example this is a top
backtrace form few seconds of building the kernel:

@[
    inode_maybe_inc_iversion+5
    inode_update_timestamps+238
    generic_update_time+19
    file_update_time+125
    shmem_file_write_iter+118
    vfs_write+599
    ksys_write+103
    do_syscall_64+82
    entry_SYSCALL_64_after_hwframe+118
, 1]: 1670

the '1' at the end indicates 'force' flag set to 1.

This also shows up on unlink et al.

The context here is that vfs is dog slow single-threaded in part due
to spurious barriers sneaked in all over the place, here is another
example:
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.misc&id=30eb7cc03875b508ce6683c8b3cf6e88442a2f87

-- 
Mateusz Guzik <mjguzik gmail.com>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux