> On Thu, May 5, 2022 at 5:01 PM Jan Kara <jack@xxxxxxx> wrote: > > > > On Thu 05-05-22 12:45:56, Jchao sun wrote: > > > On Thu, May 5, 2022 at 3:38 AM Jan Kara <jack@xxxxxxx> wrote: > > > > > > > > On Wed 04-05-22 11:25:14, Jchao Sun wrote: > > > > > Commit b35250c0816c ("writeback: Protect inode->i_io_list with > > > > > inode->i_lock") made inode->i_io_list not only protected by > > > > > wb->list_lock but also inode->i_lock, but inode_io_list_move_locked() > > > > > was missed. Add lock there and also update comment describing things > > > > > protected by inode->i_lock. > > > > > > > > > > Fixes: b35250c0816c ("writeback: Protect inode->i_io_list with > > > inode->i_lock") > > > > > Signed-off-by: Jchao Sun <sunjunchao2870@xxxxxxxxx> > > > > > > > > Almost there :). A few comments below: > > > > > > > > > @@ -2402,6 +2404,9 @@ void __mark_inode_dirty(struct inode *inode, int > > > flags) > > > > > inode->i_state &= ~I_DIRTY_TIME; > > > > > inode->i_state |= flags; > > > > > > > > > > + wb = locked_inode_to_wb_and_lock_list(inode); > > > > > + spin_lock(&inode->i_lock); > > > > > + > > > > > > > > > > > > We don't want to lock wb->list_lock if the inode was already dirty (which > > > > > is a common path). So you want something like: > > > > > > > > > > if (was_dirty) > > > > > wb = locked_inode_to_wb_and_lock_list(inode); > > > > > > I'm a little confused about here. The logic of the current source tree is > > > like this: > > > if (!was_dirty) { > > > struct bdi_writeback *wb; > > > wb = > > > locked_inode_to_wb_and_lock_list(inode); > > > ... > > > dirty_list = &wb-> b_dirty_time; > > > assert_spin_locked(&wb->list_lock); > > > } > > > The logic is the opposite of the logic in the comments, and it seems like > > > that wb will > > > absolutely not be NULL. > > > Why is this? What is the difference between them? > > > > Sorry, that was a typo in my suggestion. It should have been > > > > if (!was_dirty) > > wb = locked_inode_to_wb_and_lock_list(inode); 1. I have noticed that move_expired_inodes() has the logic as follows: list_move(&inode->i_io_list, &tmp); spin_lock(&inode->i_lock); inode->i_state |= I_SYNC_QUEUED; spin_unlock(&inode->i_lock); ... list_move(&inode->i_io_list, dispatch_queue); Neither of the two operations on i_io_list are protected with inode->i_lock. It looks like that do this on purpose, I'm a little confused about this. I wonder that is this a mistake. or did this on purpose and there is something I have missed? If the later, why is that? 2. I also have some doubts about the results of testing for xfs with xfstests.I'll describe my test steps later. The kernel version used for testing is 5.18.0-rc5-00016-g107c948d1d3e-dirty, and the latest commit is 107c948d1d3e ("Merge tag 'seccomp-v5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux") <a> First tested without this patch, there are always a few fixed cases where it will fail, maybe I got something wrong. Here is the testing result. Failures: xfs/078 xfs/191-input-validation xfs/252 xfs/289 xfs/293 xfs/514 xfs/515 xfs/544 Failed 8 of 304 tests <b> Then tested with the patch which applied your suggestions. The result is unstable. There is a high probability that there will be more failures(which will report as follows), and a small probability that the test result is the same as the above test which without this patch. xfs/206 ... umount: /mnt/test: target is busy. _check_xfs_filesystem: filesystem on /dev/loop0 has dirty log (see /root/xfstests-dev/results/xfs/206.full for details) _check_xfs_filesystem: filesystem on /dev/loop0 is inconsistent(r) (see /root/xfstests-dev/results/xfs/206.full for details) ... Failures: xfs/078 xfs/149 xfs/164 xfs/165 xfs/191-input-validation xfs/206 xfs/222 xfs/242 xfs/250 xfs/252 xfs/259 xfs/260 xfs/289 xfs/290 xfs/292 xfs/293 xfs/514 xfs/515 xfs/544 Failed 19 of 304 tests. I saw that there is a "fatal error: couldn't initialize XFS library" which means xfs_repair have failed. <c> Lastly tested with the patch which applied your suggestions and some modifications which made operations on i_io_list in move_expired_inodes() will be protected by inode->i_lock. There is a high probability that the result is the same as the test in <a>, and a small probability the same as <b> I think I must be missing something I don't understand yet, do I? 3. Here are my test steps. xfs_io -f -c "falloc 0 10g" test.img mkfs.xfs test.img losetup /dev/loop0 ./test.img mount /dev/loop0 /mnt/test export TEST_DEV=/dev/loop0 export TEST_DIR=/mnt/test ./check -g xfs/quick reboot after the tests(If don't reboot, following test results will become more unstable and have more failures) Repeat the above steps. I'm sorry to bother you, but the results I got are so weird and I really want to figure it out. Do you have any suggestions for this? Looking forward to your reply and I'm happy to provide more info if needed. > > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR