Re: [syzbot] [kernfs?] possible deadlock in kernfs_fop_llseek

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 4, 2024 at 11:21 AM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Apr 04, 2024 at 09:11:22AM +0100, Al Viro wrote:
> > On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote:
> > >
> > > In the lockdep dependency chain, overlayfs inode lock is taken
> > > before kernfs internal of->mutex, where kernfs (sysfs) is the lower
> > > layer of overlayfs, which is sane.
> > >
> > > With /sys/power/resume (and probably other files), sysfs also
> > > behaves as a stacking filesystem, calling vfs helpers, such as
> > > lookup_bdev() -> kern_path(), which is a behavior of a stacked
> > > filesystem, without all the precautions that comes with behaving
> > > as a stacked filesystem.
> >
> > No.  This is far worse than anything stacked filesystems do - it's
> > an arbitrary pathname resolution while holding a lock.
> > It's not local.  Just about anything (including automounts, etc.)
> > can be happening there and it pushes the lock in question outside
> > of *ALL* pathwalk-related locks.  Pathname doesn't have to
> > resolve to anything on overlayfs - it can just go through
> > a symlink on it, or walk into it and traverse a bunch of ..
> > afterwards, etc.
> >
> > Don't confuse that with stacking - it's not even close.
> > You can't use that anywhere near overlayfs layers.
> >
> > Maybe isolate it into a separate filesystem, to be automounted
> > on /sys/power.  And make anyone playing with overlayfs with
> > sysfs as a layer mount the damn thing on top of power/ in your
> > overlayfs.  But using that thing as a part of layer is
> > a non-starter.

I don't follow what you are saying.
Which code is in non-starter violation?
kernfs for calling lookup_bdev() with internal of->mutex held?
Overlayfs for allowing sysfs as a lower layer and calling
vfs_llseek(lower_sysfs_file,...) during copy up while ovl inode is held
for legit reasons (e.g. from ovl_rename())?

>
> Incidentally, why do you need to lock overlayfs inode to call
> vfs_llseek() on the underlying file?  It might (or might not)
> need to lock the underlying file (for things like ->i_size,
> etc.), but that will be done by ->llseek() instance and it
> would deal with the inode in the layer, not overlayfs one.

We do not (anymore) lock ovl inode in ovl_llseek(), see:
b1f9d3858f72 ovl: use ovl_inode_lock in ovl_llseek()
but ovl inode is held in operations (e.g. ovl_rename)
which trigger copy up and call vfs_llseek() on the lower file.

>
> Similar question applies to ovl_write_iter() - why do you
> need to hold the overlayfs inode locked during the call of
> backing_file_write_iter()?
>

Not sure. This question I need to defer to Miklos.
I see in several places the pattern:
        inode_lock(inode);
        /* Update mode */
        ovl_copyattr(inode);
        ret = file_remove_privs(file);
...
        /* Update size */
        ovl_file_modified(file);
...
        inode_unlock(inode);

so it could be related to atomic remove privs and update mtime,
but possibly we could convert all of those inode_lock() to
ovl_inode_lock() (i.e. internal lock below vfs inode lock).

[...]
> Consider the scenario when unlink() is called on that sucker
> during the write() that triggers that pathwalk.  We have
>
> unlink: blocked on overlayfs inode of file, while holding the parent directory.
> write: holding the overlayfs inode of file and trying to resolve a pathname
> that contains .../power/suspend_stats/../../...; blocked on attempt to lock
> parent so we could do a lookup in it.

This specifically cannot happen because sysfs is not allowed as an
upper layer only as a lower layer, so overlayfs itself will not be writing to
/sys/power/resume.

>
> No llseek involved anywhere, kernfs of->mutex held, but not contended,
> deadlock purely on ->i_rwsem of overlayfs inodes.
>
> Holding overlayfs inode locked during the call of lookup_bdev() is really
> no-go.

Yes. I see that, but how can this be resolved?

If the specific file /sys/power/resume will not have FMODE_LSEEK,
then ovl_copy_up_file() will not try to skip_hole on it at all and the
llseek lock dependency will be averted.

The question is whether opt-out of FMODE_LSEEK for /sys/power/resume
will break anything or if we should mark seq files in another way so that
overlayfs would not try to seek_hole on any of them categorically.

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux