Re: [RFC v3 1/1] fs/namespace: remove RCU sync for MNT_DETACH umount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 01-07-24 10:41:40, Alexander Larsson wrote:
> On Mon, Jul 1, 2024 at 7:50 AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> >
> > > I always thought the rcu delay was to ensure concurrent path walks "see" the
> > >
> > > umount not to ensure correct operation of the following mntput()(s).
> > >
> > >
> > > Isn't the sequence of operations roughly, resolve path, lock, deatch,
> > > release
> > >
> > > lock, rcu wait, mntput() subordinate mounts, put path.
> >
> > The crucial bit is really that synchronize_rcu_expedited() ensures that
> > the final mntput() won't happen until path walk leaves RCU mode.
> >
> > This allows caller's like legitimize_mnt() which are called with only
> > the RCU read-lock during lazy path walk to simple check for
> > MNT_SYNC_UMOUNT and see that the mnt is about to be killed. If they see
> > that this mount is MNT_SYNC_UMOUNT then they know that the mount won't
> > be freed until an RCU grace period is up and so they know that they can
> > simply put the reference count they took _without having to actually
> > call mntput()_.
> >
> > Because if they did have to call mntput() they might end up shutting the
> > filesystem down instead of umount() and that will cause said EBUSY
> > errors I mentioned in my earlier mails.
> 
> But such behaviour could be kept even without an expedited RCU sync.
> Such as in my alternative patch for this:
> https://www.spinics.net/lists/linux-fsdevel/msg270117.html
> 
> I.e. we would still guarantee the final mput is called, but not block
> the return of the unmount call.

So FWIW the approach of handing off the remainder of namespace_unlock()
into rcu callback for lazy unmount looks workable to me. Just as Al Viro
pointed out you cannot do all the stuff right from the RCU callback as the
context doesn't allow all the work to happen there, so you just need to
queue work from RCU callback and then do the real work from there (but OTOH
you can avoid the task work in mnput_noexpire() in that case - will need a
bit of refactoring).

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux