Re: rcu_sync_dtor() warning question

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 29 Jul 2024 10:26:58 +0200

On Fri, Jul 26, 2024 at 11:02:35AM GMT, Paul E. McKenney wrote:
> On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote:
> > Hey,
> > 
> > I could use some help with understanding a bug related to rcu that was
> > reported today. It first seems to have shown up on the 25th of July:
> > 
> > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7
> > 
> > We seem to be hitting the WARN_ON_ONCE() in:
> > 
> > void rcu_sync_dtor(struct rcu_sync *rsp)
> > {
> >         int gp_state;
> > 
> >         WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED);
> > 
> > from destroy_super_work() which gets called when a superblock is really freed.
> > 
> > If the superblock has been visible in userspace we do it via call_rcu():
> > 
> > static void destroy_super_work(struct work_struct *work)
> > {
> >         struct super_block *s = container_of(work, struct super_block,
> >                                                         destroy_work);
> >         fsnotify_sb_free(s);
> >         security_sb_free(s);
> >         put_user_ns(s->s_user_ns);
> >         kfree(s->s_subtype);
> >         for (int i = 0; i < SB_FREEZE_LEVELS; i++)
> >                 percpu_free_rwsem(&s->s_writers.rw_sem[i]);
> >         kfree(s);
> > }
> > 
> > static void destroy_super_rcu(struct rcu_head *head)
> > {
> >         struct super_block *s = container_of(head, struct super_block, rcu);
> >         INIT_WORK(&s->destroy_work, destroy_super_work);
> >         schedule_work(&s->destroy_work);
> > }
> > 
> > And I'm really confused because I don't understand the details for sync
> > rcu enough to come up with a clear problem statement even. Could someone
> > please explain what the WARN_ON_ONCE() is about?
> 
> If I am not too confused (and Oleg will correct me if I am), this is
> checking a use-after-free error.  A given rcu_sync structure normally
> transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with
> possible side-trips from the GP_EXIT state through GP_REPLAY and back
> to GP_EXIT in special cases such as during early boot.
> 
> One way that this can happen is calling percpu_free_rwsem() without
> having previously done the percpu_up_write() needed to match an earlier
> percpu_down_write().  I don't see any other way that this can happen
> right off-hand, but someone might have added something, or I might be
> suffering a failure of imagination.

And you were spot on of course!

> 
> > (I had been wondering if this is related to commit 7180f8d91fcb ("vfs:
> > add rcu-based find_inode variants for iget ops") and the fix in commit
> > 5bc9ad78c2f8 ("vfs: handle __wait_on_freeing_inode() and evict() race")
> > but I don't see how.)
> 
> I don't see how either, unless it introduced some timing change
> that made the scenario above more probable.
> 
> > Thanks for taking a look!
> 
> Hope this helps!  If not, you know where to find me.  ;-)

Thank you for taking the time, Paul. That was indeed helpful!