Re: [PATCH] fs: don't needlessly acquire f_lock

Mateusz Guzik <mjguzik@xxxxxxxxx> · Fri, 7 Feb 2025 17:42:17 +0100

On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > Before 2011 there was no meaningful synchronization between
> > read/readdir/write/seek. Only in commit
> > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > synchronization was added for SEEK_CUR by taking f_lock around
> > vfs_setpos().
> >
> > Then in 2014 full synchronization between read/readdir/write/seek was
> > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > for directories. At that point taking f_lock became unnecessary for such
> > files.
> >
> > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > acquired f_pos_lock if necessary.
> >
> > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
>
> ...
>
> >       if (whence == SEEK_CUR) {
> > +             bool locked;
> > +
> >               /*
> > -              * f_lock protects against read/modify/write race with
> > -              * other SEEK_CURs. Note that parallel writes and reads
> > -              * behave like SEEK_SET.
> > +              * If the file requires locking via f_pos_lock we know
> > +              * that mutual exclusion for SEEK_CUR on the same file
> > +              * is guaranteed. If the file isn't locked, we take
> > +              * f_lock to protect against f_pos races with other
> > +              * SEEK_CURs.
> >                */
> > -             guard(spinlock)(&file->f_lock);
> > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > +                      file->f_op->iterate_shared;
>
> As far as I understand the rationale this should match to
> file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> to me that's the case. After thinking about possibilities, I could convince
> myself that what you suggest is indeed safe but the condition being in two
> completely independent places and leading to subtle bugs if it gets out of
> sync seems a bit fragile to me.
>

A debug-only assert that the lock is held when expected should sort it out?

> > +             if (!locked)
> > +                     spin_lock(&file->f_lock);
> > +             offset = vfs_setpos(file, file->f_pos + offset, maxsize);
> > +             if (!locked)
> > +                     spin_unlock(&file->f_lock);
> > +             return offset;
> >       }
> >
> >       return vfs_setpos(file, offset, maxsize);

btw I ran this sucker over a kernel build:

bpftrace -e 'kprobe:generic_file_llseek_size { @[((struct file
*)arg0)->f_mode & (1 << 15), ((struct file
*)arg0)->f_op->iterate_shared, arg2] = count(); }
'
Attaching 1 probe...
^C

@[32768, 0x0, 2]: 9
@[32768, 0x0, 1]: 171797
@[32768, 0x0, 0]: 660866

SEEK_CUR is 1, so this *does* show up.

-- 
Mateusz Guzik <mjguzik gmail.com>