On Fri, Jul 26, 2024 at 8:02 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Fri, Jul 26, 2024 at 03:54:28PM +0200, Christian Brauner wrote: > > Hey, > > > > I could use some help with understanding a bug related to rcu that was > > reported today. It first seems to have shown up on the 25th of July: > > > > https://syzkaller.appspot.com/bug?extid=20d7e439f76bbbd863a7 > > > > We seem to be hitting the WARN_ON_ONCE() in: > > > > void rcu_sync_dtor(struct rcu_sync *rsp) > > { > > int gp_state; > > > > WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_PASSED); > > > > from destroy_super_work() which gets called when a superblock is really freed. > > > > If the superblock has been visible in userspace we do it via call_rcu(): > > > > static void destroy_super_work(struct work_struct *work) > > { > > struct super_block *s = container_of(work, struct super_block, > > destroy_work); > > fsnotify_sb_free(s); > > security_sb_free(s); > > put_user_ns(s->s_user_ns); > > kfree(s->s_subtype); > > for (int i = 0; i < SB_FREEZE_LEVELS; i++) > > percpu_free_rwsem(&s->s_writers.rw_sem[i]); > > kfree(s); > > } > > > > static void destroy_super_rcu(struct rcu_head *head) > > { > > struct super_block *s = container_of(head, struct super_block, rcu); > > INIT_WORK(&s->destroy_work, destroy_super_work); > > schedule_work(&s->destroy_work); > > } > > > > And I'm really confused because I don't understand the details for sync > > rcu enough to come up with a clear problem statement even. Could someone > > please explain what the WARN_ON_ONCE() is about? > > If I am not too confused (and Oleg will correct me if I am), this is > checking a use-after-free error. A given rcu_sync structure normally > transitions from GP_IDLE->GP_ENTER->GP_PASSED->GP_EXIT->GP_IDLE, with > possible side-trips from the GP_EXIT state through GP_REPLAY and back > to GP_EXIT in special cases such as during early boot. > use-after-free? In that case I have a candidate for a culprit. Code prior to any of my changes was doing the following in iget_locked(): spin_lock(&inode_hash_lock); inode = find_inode_fast(sb, head, ino); spin_unlock(&inode_hash_lock); if (inode) { if (IS_ERR(inode)) return NULL; wait_on_inode(inode); if (unlikely(inode_unhashed(inode))) { iput(inode); goto again; } return inode; } My patch removed the spinlock acquire and made it significantly more likely for the code to end up doing the wait_on_inode + inode_unhashed combo when racing against inode teardown. Now that you bring up use-after-free I'm not particularly confident the stock code is correct. For example evict_inodes() -> evict() can mess with the inode and result in the iput() call in iget_locked(), which then will invoke evict() again. And I'm not particularly confident the routine + everything it calls is idempotent. That's from a quick poke around, maybe I missed something. syzkaller claims to have a reproducer. Trivial usage in my debug vm does not result in anything, so I may need to grab their entire setup to reproduce. I'm going to look into it. -- Mateusz Guzik <mjguzik gmail.com>