Re: SRCU use from different contexts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+fsdevel]

On Thu, Nov 24, 2022 at 2:21 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Wed, Nov 23, 2022 at 12:46:45PM +0100, Jan Kara wrote:
> > Hello!
> >
> > We were pondering with Amir about some issues with fsnotify subsystem and
> > as a building block we would need a mechanism to make sure write(2) has
> > completed. For simplicity we could imagine it like a sequence
> >
> > write(2)
> >   START
> >   do stuff to perform write
> >   END
> >
> > and we need a mechanism to wait for all processes that already passed START
> > to reach END. Ideally without blocking new writes while we wait for the
> > pending ones. Now this seems like a good task for SRCU. We could do:
> >
> > write(2)
> >   srcu_read_lock(&sb->s_write_rcu);
> >   do stuff to perform write
> >   srcu_read_unlock(&sb->s_write_rcu);
> >
> > and use synchronize_srcu(&sb->s_write_rcu) for waiting.
> >
> > But the trouble with writes is there are things like aio or io_uring where
> > the part with srcu_read_lock() happens from one task (the submitter) while
> > the part with srcu_read_unlock() happens from another context (usually worker
> > thread triggered by IRQ reporting that the HW has finished the IO).
> >
> > Is there any chance to make SRCU work in a situation like this? It seems to
> > me in principle it should be possible to make this work but maybe there are
> > some implementation constraints I'm missing...
>
> The srcu_read_lock_notrace() and srcu_read_unlock_notrace() functions
> will work for this, though that is not their intended purpose.  Plus you
> might want to trace these functions, which, as their names indicate, is
> not permitted.  I assume that you do not intend to use these functions
> from NMI handlers, though that really could be accommodated.  (But why
> would you need that?)
>
> So how about srcu_down_read() and srcu_up_read(), as shown in the
> (untested) patch below?
>
> Note that you do still need to pass the return value from srcu_down_read()
> into srcu_up_read().  I am guessing that io_uring has a convenient place
> that this value can be placed.  No idea about aio.
>

Sure, aio completion has context.

> Thoughts?

That looks great! Thank you.

Followup question:
Both fs/aio.c:aio_write() and io_uring/rw.c:io_write() do this ugly
thing:

/*
 * Open-code file_start_write here to grab freeze protection,
 * which will be released by another thread in
 * aio_complete_rw().  Fool lockdep by telling it the lock got
 * released so that it doesn't complain about the held lock when
 * we return to userspace.
 */
if (S_ISREG(file_inode(file)->i_mode)) {
    sb_start_write(file_inode(file)->i_sb);
    __sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
}

And in write completion:

/*
 * Tell lockdep we inherited freeze protection from submission
 * thread.
 */
if (S_ISREG(inode->i_mode))
    __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
file_end_write(kiocb->ki_filp);

I suppose we also need to "fool lockdep" w.r.t returning to userspace
with an acquired srcu?

Thanks,
Amir.

>
>                                                         Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 0efa1e7b5a862e9c2f1bf8c19db6bd142ad35355
> Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
> Date:   Wed Nov 23 15:49:55 2022 -0800
>
>     rcu: Add srcu_down_read() and srcu_up_read()
>
>     A pair of matching srcu_read_lock() and srcu_read_unlock() invocations
>     must take place within the same context, for example, within the same
>     task.  Otherwise, lockdep complains, as is the right thing to do for
>     most use cases.
>
>     However, there are use cases involving asynchronous I/O where the
>     SRCU reader needs to begin on one task and end on another.  This commit
>     therefore supplies the semaphore-like srcu_down_read() and srcu_up_read(),
>     which act like srcu_read_lock() and srcu_read_unlock(), but permitting
>     srcu_up_read() to be invoked in a different context than was the matching
>     srcu_down_read().
>
>     Neither srcu_down_read() nor srcu_up_read() may be invoked from an
>     NMI handler.
>
>     Reported-by: Jan Kara <jack@xxxxxxx>
>     Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>
> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index 9b9d0bbf1d3cf..74796cd7e7a9d 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -214,6 +214,34 @@ srcu_read_lock_notrace(struct srcu_struct *ssp) __acquires(ssp)
>         return retval;
>  }
>
> +/**
> + * srcu_down_read - register a new reader for an SRCU-protected structure.
> + * @ssp: srcu_struct in which to register the new reader.
> + *
> + * Enter a semaphore-like SRCU read-side critical section.  Note that
> + * SRCU read-side critical sections may be nested.  However, it is
> + * illegal to call anything that waits on an SRCU grace period for the
> + * same srcu_struct, whether directly or indirectly.  Please note that
> + * one way to indirectly wait on an SRCU grace period is to acquire
> + * a mutex that is held elsewhere while calling synchronize_srcu() or
> + * synchronize_srcu_expedited().  But if you want lockdep to help you
> + * keep this stuff straight, you should instead use srcu_read_lock().
> + *
> + * The semaphore-like nature of srcu_down_read() means that the matching
> + * srcu_up_read() can be invoked from some other context, for example,
> + * from some other task or from an irq handler.  However, neither
> + * srcu_down_read() nor srcu_up_read() may be invoked from an NMI handler.
> + *
> + * Calls to srcu_down_read() may be nested, similar to the manner in
> + * which calls to down_read() may be nested.
> + */
> +static inline int srcu_down_read(struct srcu_struct *ssp) __acquires(ssp)
> +{
> +       WARN_ON_ONCE(in_nmi());
> +       srcu_check_nmi_safety(ssp, false);
> +       return __srcu_read_lock(ssp);
> +}
> +
>  /**
>   * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
>   * @ssp: srcu_struct in which to unregister the old reader.
> @@ -254,6 +282,23 @@ srcu_read_unlock_notrace(struct srcu_struct *ssp, int idx) __releases(ssp)
>         __srcu_read_unlock(ssp, idx);
>  }
>
> +/**
> + * srcu_up_read - unregister a old reader from an SRCU-protected structure.
> + * @ssp: srcu_struct in which to unregister the old reader.
> + * @idx: return value from corresponding srcu_read_lock().
> + *
> + * Exit an SRCU read-side critical section, but not necessarily from
> + * the same context as the maching srcu_down_read().
> + */
> +static inline void srcu_up_read(struct srcu_struct *ssp, int idx)
> +       __releases(ssp)
> +{
> +       WARN_ON_ONCE(idx & ~0x1);
> +       WARN_ON_ONCE(in_nmi());
> +       srcu_check_nmi_safety(ssp, false);
> +       __srcu_read_unlock(ssp, idx);
> +}
> +
>  /**
>   * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
>   *



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux