Re: SRCU use from different contexts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 24-11-22 08:21:13, Amir Goldstein wrote:
> [+fsdevel]
> 
> On Thu, Nov 24, 2022 at 2:21 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > On Wed, Nov 23, 2022 at 12:46:45PM +0100, Jan Kara wrote:
> > > Hello!
> > >
> > > We were pondering with Amir about some issues with fsnotify subsystem and
> > > as a building block we would need a mechanism to make sure write(2) has
> > > completed. For simplicity we could imagine it like a sequence
> > >
> > > write(2)
> > >   START
> > >   do stuff to perform write
> > >   END
> > >
> > > and we need a mechanism to wait for all processes that already passed START
> > > to reach END. Ideally without blocking new writes while we wait for the
> > > pending ones. Now this seems like a good task for SRCU. We could do:
> > >
> > > write(2)
> > >   srcu_read_lock(&sb->s_write_rcu);
> > >   do stuff to perform write
> > >   srcu_read_unlock(&sb->s_write_rcu);
> > >
> > > and use synchronize_srcu(&sb->s_write_rcu) for waiting.
> > >
> > > But the trouble with writes is there are things like aio or io_uring where
> > > the part with srcu_read_lock() happens from one task (the submitter) while
> > > the part with srcu_read_unlock() happens from another context (usually worker
> > > thread triggered by IRQ reporting that the HW has finished the IO).
> > >
> > > Is there any chance to make SRCU work in a situation like this? It seems to
> > > me in principle it should be possible to make this work but maybe there are
> > > some implementation constraints I'm missing...
> >
> > The srcu_read_lock_notrace() and srcu_read_unlock_notrace() functions
> > will work for this, though that is not their intended purpose.  Plus you
> > might want to trace these functions, which, as their names indicate, is
> > not permitted.  I assume that you do not intend to use these functions
> > from NMI handlers, though that really could be accommodated.  (But why
> > would you need that?)
> >
> > So how about srcu_down_read() and srcu_up_read(), as shown in the
> > (untested) patch below?
> >
> > Note that you do still need to pass the return value from srcu_down_read()
> > into srcu_up_read().  I am guessing that io_uring has a convenient place
> > that this value can be placed.  No idea about aio.
> >
> 
> Sure, aio completion has context.
> 
> > Thoughts?
> 
> That looks great! Thank you.
> 
> Followup question:
> Both fs/aio.c:aio_write() and io_uring/rw.c:io_write() do this ugly
> thing:
> 
> /*
>  * Open-code file_start_write here to grab freeze protection,
>  * which will be released by another thread in
>  * aio_complete_rw().  Fool lockdep by telling it the lock got
>  * released so that it doesn't complain about the held lock when
>  * we return to userspace.
>  */
> if (S_ISREG(file_inode(file)->i_mode)) {
>     sb_start_write(file_inode(file)->i_sb);
>     __sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
> }
> 
> And in write completion:
> 
> /*
>  * Tell lockdep we inherited freeze protection from submission
>  * thread.
>  */
> if (S_ISREG(inode->i_mode))
>     __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
> file_end_write(kiocb->ki_filp);
> 
> I suppose we also need to "fool lockdep" w.r.t returning to userspace
> with an acquired srcu?

So AFAICT the whole point of Paul's new helpers is to not use lockdep and
thus not have to play the "fool lockdep" games.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux