Re: [PATCH] fs: don't scan the inode cache before SB_ACTIVE is set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 26, 2018 at 06:31:51AM +0100, Al Viro wrote:
> On Mon, Mar 26, 2018 at 03:35:03PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > We recently had an oops reported on a 4.14 kernel in
> > xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
> > and so the m_perag_tree lookup walked into lala land.
> > 
> > We found a mount in a failed state, blocked on teh shrinker rwsem
> > here:
> > 
> > mount_bdev()
> >   deactivate_locked_super()
> >     unregister_shrinker()
> > 
> > Essentially, the machine was under memory pressure when the mount
> > was being run, xfs_fs_fill_super() failed after allocating the
> > xfs_mount and attaching it to sb->s_fs_info. It then cleaned up and
> > freed the xfs_mount, but the sb->s_fs_info field still pointed to
> > the freed memory. Hence when the superblock shrinker then ran
> > it fell off the bad pointer.
> > 
> > This is reproduced by using the mount_delay sysfs control as added
> > in teh previous patch. It produces an oops down this path during the
> > stalled mount:
> 
> > The problem is that the superblock shrinker is running before the
> > filesystem structures it depends on have been fully set up. i.e.
> > the shrinker is registered in sget(), before ->fill_super() has been
> > called, and the shrinker can call into the filesystem before
> > fill_super() does it's setup work.
> 
> Wait a sec...  How the hell does it get through trylock_super() before
> ->s_root is set and ->s_umount is unlocked?

I see...  So basically the story is

* super_cache_count() lacks trylock_super(), making it possible that it'll
be called too early on half-set superblock.
* it can't be called too late (during fs shutdown), since the shrinker is
unregistered before the call of ->kill_sb()
* making sure it won't get called too early can be done by checking SB_ACTIVE.

It's potentially racy, though - don't we need a barrier between setting the
things up and setting SB_ACTIVE?

And that, BTW, means that we want SB_BORN instead of SB_ACTIVE - unlike the
latter, the former is set only in one place.  So I'd suggest switching to
checking that, with a barrier pair added (in mount_fs() before setting the
sucker, another in super_cache_count() (before doing the
scan).



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux