On Thu, 2024-08-22 at 19:41 +0800, Julian Sun wrote: > Hi, Josef. > Thanks for your review and comments. > Yeah, you are correct, sorry for the mistake. > > I'm thinking if is there any misunderstanding in the following > scenario > > umount thread: btrfs-cleaner thread: > btrfs_run_delayed_iputs() > ->run_delayed_iput_locked() > ->iput(inode) > ->spin_lock(inode) > // inode->i_count dec to 0 > ->iput_final() > ->__inode_add_lru() > // mapping_shrinkable() > returned false > // so iput_final() > returned with i_count=0 > // note here: the inode > still > btrfs_kill_super() // in the sb list and > I_FREEING > ->generic_shutdown_super() // was not set > ->evict_inodes() ->spin_unlock(inode) > ->spin_lock(inode) > // i_count is 0 and // continue running... > // I_FREEING was not set ->__btrfs_run_defrag_inode() > // so the inode was added to ->btrfs_iget() > // dispost list // note the inode is > still > // in the sb list > // and > I_WILL_FREE|I_FREEING > // was not set > ->spin_unlock() ->find_inode() > ->spin_lock(inode) > ->__iget(); > ->spin_unlock(); > ->dispose_list() > ->evict() ->iput() > // note: i_lock was not ->spin_lock(inode) > // hold here. ->iput_final() > ->spin_unlock() > ->evict() > The understanding here is not correct, I will try to analyze it more carefully. > Now, we have two threads simultaneously evicting > the same inode, which led to a bug. > > The key points here are: > 1. btrfs-cleaner: iput_final() called __inode_add_lru() and then > return because some reason with i_count=0, and I_FREEING was not set, > and most important, i_lock was unlocked, > which gives evict_inodes() a perfect chance to put the inode into > dispose list. > 2. umount thread: The inode (which i_count is 0 and I_FREEING was not > set) > was added to dispose list. > 3. btrfs-cleaner: Before umount thread set I_FREEING flag, > btrfs_iget() > finded the inode and inc i_count > 4. umount thread called evict() and btrfs-cleaner called iput(), > which leads > to bug. > > The above behavior can be confirmed by the debugging log > based on commit d30d0e49da71: > > diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c > index 407ccec3e57e..cc50dd1e8beb 100644 > --- a/fs/btrfs/defrag.c > +++ b/fs/btrfs/defrag.c > @@ -278,6 +278,8 @@ static int __btrfs_run_defrag_inode(struct > btrfs_fs_info *fs_info, > ret = btrfs_defrag_file(inode, NULL, &range, defrag->transid, > BTRFS_DEFRAG_BATCH); > sb_end_write(fs_info->sb); > + if (inode->i_state & (1<<19)) > + pr_info("bug! double evict! crash will happen\n"); > iput(inode); > > if (ret < 0) > diff --git a/fs/inode.c b/fs/inode.c > index 3a41f83a4ba5..641a828bc2b7 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -461,8 +461,14 @@ static void __inode_add_lru(struct inode *inode, > bool rotate) > return; > if (!(inode->i_sb->s_flags & SB_ACTIVE)) > return; > - if (!mapping_shrinkable(&inode->i_data)) > + if (!mapping_shrinkable(&inode->i_data)) { > + if (inode->i_ino == 261) { > + pr_info("set inode 0x%px dbg_flag casue > mapping_shrinkable\n", inode); > + dump_stack(); > + WRITE_ONCE(inode->i_state, inode- > > i_state|(1<<19)); > + } > return; > + } > > if (list_lru_add_obj(&inode->i_sb->s_inode_lru, &inode- > > i_lru)) > this_cpu_inc(nr_unused); > @@ -623,7 +629,10 @@ void clear_inode(struct inode *inode) > xa_unlock_irq(&inode->i_data.i_pages); > BUG_ON(!list_empty(&inode->i_data.i_private_list)); > BUG_ON(!(inode->i_state & I_FREEING)); > - BUG_ON(inode->i_state & I_CLEAR); > + if (inode->i_state & I_CLEAR) { > + pr_info("inode 0x%px state 0x%lx triggered bug\n", > inode, inode->i_state); > + BUG(); > + } > BUG_ON(!list_empty(&inode->i_wb_list)); > /* don't need i_lock here, no concurrent mods to i_state */ > inode->i_state = I_FREEING | I_CLEAR; > @@ -1754,7 +1763,10 @@ void iput(struct inode *inode) > { > if (!inode) > return; > - BUG_ON(inode->i_state & I_CLEAR); > + if (inode->i_state & I_CLEAR) { > + pr_info("inode 0x%px state 0x%lx triggered a bug\n", > inode, inode->i_state); > + BUG(); > + } > retry: > if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) { > if (inode->i_nlink && (inode->i_state & > I_DIRTY_TIME)) > { > @@ -1764,6 +1776,8 @@ void iput(struct inode *inode) > mark_inode_dirty_sync(inode); > goto retry; > } > + if (inode->i_ino == 261 && (inode->i_state &(1<<19))) > + pr_info("state is 0x%lx, bug may be > triggered\n", inode->i_state); > iput_final(inode); > } > } > > The (1<<19) bit was used to debugging. > > Here is the log when bug was triggered: > > [ 33.028024][ T2451] set inode 0xffff88801fb87738 dbg_flag casue > mapping_shrinkable > [ 33.028478][ T2451] CPU: 0 PID: 2451 Comm: btrfs-cleaner Not > tainted > 6.10.0-rc2-00225-gf0deb01d52b3-dirty #132 > [ 33.029072][ T2451] Hardware name: QEMU Standard PC (i440FX + > PIIX, > 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 33.029675][ T2451] Call Trace: > [ 33.029873][ T2451] <TASK> > [ 33.030046][ T2451] dump_stack_lvl+0x16c/0x1f0 > [ 33.030330][ T2451] __inode_add_lru.part.0+0x1ea/0x280 > [ 33.030647][ T2451] iput+0x7c7/0x920 > [ 33.030876][ T2451] run_delayed_iput_locked+0x136/0x1e0 > [ 33.031199][ T2451] btrfs_run_delayed_iputs+0x8e/0x120 > [ 33.031516][ T2451] cleaner_kthread+0x2d3/0x480 > [ 33.031801][ T2451] ? __pfx_cleaner_kthread+0x10/0x10 > [ 33.032113][ T2451] kthread+0x2c1/0x3a0 > [ 33.032354][ T2451] ? _raw_spin_unlock_irq+0x23/0x50 > [ 33.032659][ T2451] ? __pfx_kthread+0x10/0x10 > [ 33.032929][ T2451] ret_from_fork+0x45/0x80 > [ 33.033198][ T2451] ? __pfx_kthread+0x10/0x10 > [ 33.033469][ T2451] ret_from_fork_asm+0x1a/0x30 > [ 33.033756][ T2451] </TASK> > [ 33.033978][ T2451] bug! double evict! crash will happen > [ 33.034168][ T2332] BTRFS info (device loop0): last unmount of > filesystem c9fe44da-de57-406a-8241-57ec7d4412cf > [ 33.034295][ T2451] inode 0xffff88801fb87738 state 0x60 triggered > a > bug > [ 33.034332][ T2451] ------------[ cut here ]------------ > [ 33.035578][ T2451] kernel BUG at fs/inode.c:1768! > [ 33.035873][ T2451] Oops: invalid opcode: 0000 [#1] PREEMPT SMP > KASAN NOPTI > [ 33.036286][ T2451] CPU: 0 PID: 2451 Comm: btrfs-cleaner Not > tainted > 6.10.0-rc2-00225-gf0deb01d52b3-dirty #132 > [ 33.036870][ T2451] Hardware name: QEMU Standard PC (i440FX + > PIIX, > 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 33.037464][ T2451] RIP: 0010:iput+0x896/0x920 > // logs ... > [ 33.042882][ T2451] PKRU: 55555554 > [ 33.043092][ T2451] Call Trace: > [ 33.043288][ T2451] <TASK> > [ 33.044190][ T2451] ? iput+0x896/0x920 > [ 33.047321][ T2451] ? btrfs_run_defrag_inodes+0xb49/0xe60 > [ 33.047653][ T2451] btrfs_run_defrag_inodes+0xa40/0xe60 > [ 33.047975][ T2451] ? __pfx_btrfs_run_defrag_inodes+0x10/0x10 > [ 33.048327][ T2451] ? __pfx___mutex_unlock_slowpath+0x10/0x10 > [ 33.048678][ T2451] ? __pfx_do_raw_spin_lock+0x10/0x10 > [ 33.048996][ T2451] ? _raw_spin_unlock+0x28/0x50 > [ 33.049285][ T2451] ? > btrfs_clean_one_deleted_snapshot+0x2b2/0x420 > [ 33.049660][ T2451] cleaner_kthread+0x2ee/0x480 > [ 33.049941][ T2451] ? __pfx_cleaner_kthread+0x10/0x10 > [ 33.050252][ T2451] kthread+0x2c1/0x3a0 > [ 33.050492][ T2451] ? _raw_spin_unlock_irq+0x23/0x50 > [ 33.050796][ T2451] ? __pfx_kthread+0x10/0x10 > [ 33.051067][ T2451] ret_from_fork+0x45/0x80 > [ 33.051333][ T2451] ? __pfx_kthread+0x10/0x10 > [ 33.051603][ T2451] ret_from_fork_asm+0x1a/0x30 > [ 33.051887][ T2451] </TASK> > [ 33.052066][ T2451] Modules linked in: > [ 33.052316][ T2451] ---[ end trace 0000000000000000 ]--- > [ 33.052634][ T2451] RIP: 0010:iput+0x896/0x920 > > Please note inode 0xffff88801fb87738, which was set debug flag by > __inode_add_lru() in the run_delayed_iput_locked() code path. This > indicates that the inode was not actually evicted. > Then, please note the double evict log, which shows that this > inode which was set debug flag and was not be evicted is > still on the sb list and was successfully acquired by btrfs_iget(). > Next, take note of the unmount log, indicating that there was > a concurrent unmount thread which has already called evict_inodes() > at that time. > Finally, observe where the BUG was triggered: the state of inode > 0xffff88801fb87738 had been set to 0x60 by the unmount thread calling > inode_clear(), and at the same time, btrfs-cleaner called iput(), > ultimately leading to the BUG at the beginning check of iput(). > Although the point of trigger differs from the one reported by > syzbot, there is no doubt that they are the same bug. > > You could apply the above diff and try to reproduce the bug using the > reproducer[1]. > > If there is anything I'm missing or more info was needed, please let > me know. > > [1]: https://syzkaller.appspot.com/x/repro.c?x=14c57f16980000 > > Thanks.