Re: SRCU use from different contexts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 24-11-22 08:11:47, Paul E. McKenney wrote:
> On Thu, Nov 24, 2022 at 10:58:40AM +0100, Jan Kara wrote:
> > On Thu 24-11-22 08:21:13, Amir Goldstein wrote:
> > > [+fsdevel]
> > > 
> > > On Thu, Nov 24, 2022 at 2:21 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Nov 23, 2022 at 12:46:45PM +0100, Jan Kara wrote:
> > > > > Hello!
> > > > >
> > > > > We were pondering with Amir about some issues with fsnotify subsystem and
> > > > > as a building block we would need a mechanism to make sure write(2) has
> > > > > completed. For simplicity we could imagine it like a sequence
> > > > >
> > > > > write(2)
> > > > >   START
> > > > >   do stuff to perform write
> > > > >   END
> > > > >
> > > > > and we need a mechanism to wait for all processes that already passed START
> > > > > to reach END. Ideally without blocking new writes while we wait for the
> > > > > pending ones. Now this seems like a good task for SRCU. We could do:
> > > > >
> > > > > write(2)
> > > > >   srcu_read_lock(&sb->s_write_rcu);
> > > > >   do stuff to perform write
> > > > >   srcu_read_unlock(&sb->s_write_rcu);
> > > > >
> > > > > and use synchronize_srcu(&sb->s_write_rcu) for waiting.
> > > > >
> > > > > But the trouble with writes is there are things like aio or io_uring where
> > > > > the part with srcu_read_lock() happens from one task (the submitter) while
> > > > > the part with srcu_read_unlock() happens from another context (usually worker
> > > > > thread triggered by IRQ reporting that the HW has finished the IO).
> > > > >
> > > > > Is there any chance to make SRCU work in a situation like this? It seems to
> > > > > me in principle it should be possible to make this work but maybe there are
> > > > > some implementation constraints I'm missing...
> > > >
> > > > The srcu_read_lock_notrace() and srcu_read_unlock_notrace() functions
> > > > will work for this, though that is not their intended purpose.  Plus you
> > > > might want to trace these functions, which, as their names indicate, is
> > > > not permitted.  I assume that you do not intend to use these functions
> > > > from NMI handlers, though that really could be accommodated.  (But why
> > > > would you need that?)
> > > >
> > > > So how about srcu_down_read() and srcu_up_read(), as shown in the
> > > > (untested) patch below?
> > > >
> > > > Note that you do still need to pass the return value from srcu_down_read()
> > > > into srcu_up_read().  I am guessing that io_uring has a convenient place
> > > > that this value can be placed.  No idea about aio.
> > > >
> > > 
> > > Sure, aio completion has context.
> > > 
> > > > Thoughts?
> > > 
> > > That looks great! Thank you.
> > > 
> > > Followup question:
> > > Both fs/aio.c:aio_write() and io_uring/rw.c:io_write() do this ugly
> > > thing:
> > > 
> > > /*
> > >  * Open-code file_start_write here to grab freeze protection,
> > >  * which will be released by another thread in
> > >  * aio_complete_rw().  Fool lockdep by telling it the lock got
> > >  * released so that it doesn't complain about the held lock when
> > >  * we return to userspace.
> > >  */
> > > if (S_ISREG(file_inode(file)->i_mode)) {
> > >     sb_start_write(file_inode(file)->i_sb);
> > >     __sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
> > > }
> > > 
> > > And in write completion:
> > > 
> > > /*
> > >  * Tell lockdep we inherited freeze protection from submission
> > >  * thread.
> > >  */
> > > if (S_ISREG(inode->i_mode))
> > >     __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
> > > file_end_write(kiocb->ki_filp);
> > > 
> > > I suppose we also need to "fool lockdep" w.r.t returning to userspace
> > > with an acquired srcu?
> > 
> > So AFAICT the whole point of Paul's new helpers is to not use lockdep and
> > thus not have to play the "fool lockdep" games.
> 
> Exactly!  ;-)
> 
> But if you do return to userspace after invoking srcu_down_read(), it
> is your responsibility to make sure that -something- eventually invokes
> srcu_up_read().  Which might or might not be able to rely on userspace
> doing something sensible.
> 
> I would guess that you have a timeout or rely on close() for that purpose,
> just as you presumably do for sb_start_write(), but figured I should
> mention it.

Yes. We actually do not rely on userspace but rather on HW to eventually
signal IO completion. For misbehaving HW there are timeouts but the details
depend very much on the protocol etc.. But as you say it is the same
business as with sb_start_write() so nothing new here.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux