Re: [PATCH] fs: don't needlessly acquire f_lock

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 10 Feb 2025 13:01:38 +0100

On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote:
> On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@xxxxxxx> wrote:
> >
> > On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > > Before 2011 there was no meaningful synchronization between
> > > read/readdir/write/seek. Only in commit
> > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > > synchronization was added for SEEK_CUR by taking f_lock around
> > > vfs_setpos().
> > >
> > > Then in 2014 full synchronization between read/readdir/write/seek was
> > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > > for directories. At that point taking f_lock became unnecessary for such
> > > files.
> > >
> > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > > acquired f_pos_lock if necessary.
> > >
> > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
> >
> > ...
> >
> > >       if (whence == SEEK_CUR) {
> > > +             bool locked;
> > > +
> > >               /*
> > > -              * f_lock protects against read/modify/write race with
> > > -              * other SEEK_CURs. Note that parallel writes and reads
> > > -              * behave like SEEK_SET.
> > > +              * If the file requires locking via f_pos_lock we know
> > > +              * that mutual exclusion for SEEK_CUR on the same file
> > > +              * is guaranteed. If the file isn't locked, we take
> > > +              * f_lock to protect against f_pos races with other
> > > +              * SEEK_CURs.
> > >                */
> > > -             guard(spinlock)(&file->f_lock);
> > > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > > +                      file->f_op->iterate_shared;
> >
> > As far as I understand the rationale this should match to
> > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> > to me that's the case. After thinking about possibilities, I could convince
> > myself that what you suggest is indeed safe but the condition being in two
> > completely independent places and leading to subtle bugs if it gets out of
> > sync seems a bit fragile to me.
> >
> 
> A debug-only assert that the lock is held when expected should sort it out?

Good idea. Let me reuse the newly added infra:
VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock));

> 
> > > +             if (!locked)
> > > +                     spin_lock(&file->f_lock);
> > > +             offset = vfs_setpos(file, file->f_pos + offset, maxsize);
> > > +             if (!locked)
> > > +                     spin_unlock(&file->f_lock);
> > > +             return offset;
> > >       }
> > >
> > >       return vfs_setpos(file, offset, maxsize);
> 
> btw I ran this sucker over a kernel build:
> 
> bpftrace -e 'kprobe:generic_file_llseek_size { @[((struct file
> *)arg0)->f_mode & (1 << 15), ((struct file
> *)arg0)->f_op->iterate_shared, arg2] = count(); }
> '
> Attaching 1 probe...
> ^C
> 
> @[32768, 0x0, 2]: 9
> @[32768, 0x0, 1]: 171797
> @[32768, 0x0, 0]: 660866
> 
> SEEK_CUR is 1, so this *does* show up.

Thanks for the perf.