Re: [PATCH v3] writeback: Fix inode->i_io_list not be protected by inode->i_lock error

Jan Kara <jack@xxxxxxx> · Mon, 9 May 2022 15:40:20 +0200

On Sat 07-05-22 00:04:45, Jchao sun wrote:
> > On Thu, May 5, 2022 at 5:01 PM Jan Kara <jack@xxxxxxx> wrote:
> > >
> > > On Thu 05-05-22 12:45:56, Jchao sun wrote:
> > > > On Thu, May 5, 2022 at 3:38 AM Jan Kara <jack@xxxxxxx> wrote:
> > > > >
> > > > > On Wed 04-05-22 11:25:14, Jchao Sun wrote:
> > > > > > Commit b35250c0816c ("writeback: Protect inode->i_io_list with
> > > > > > inode->i_lock") made inode->i_io_list not only protected by
> > > > > > wb->list_lock but also inode->i_lock, but inode_io_list_move_locked()
> > > > > > was missed. Add lock there and also update comment describing things
> > > > > > protected by inode->i_lock.
> > > > > >
> > > > > > Fixes: b35250c0816c ("writeback: Protect inode->i_io_list with
> > > > inode->i_lock")
> > > > > > Signed-off-by: Jchao Sun <sunjunchao2870@xxxxxxxxx>
> > > > >
> > > > > Almost there :). A few comments below:
> > > > >
> > > > > > @@ -2402,6 +2404,9 @@ void __mark_inode_dirty(struct inode *inode, int
> > > > flags)
> > > > > >                       inode->i_state &= ~I_DIRTY_TIME;
> > > > > >               inode->i_state |= flags;
> > > > > >
> > > > > > +             wb = locked_inode_to_wb_and_lock_list(inode);
> > > > > > +             spin_lock(&inode->i_lock);
> > > > > > +
> > > > >
> > > >
> > > > > > We don't want to lock wb->list_lock if the inode was already dirty (which
> > > > > > is a common path). So you want something like:
> > > > > >
> > > > > >                 if (was_dirty)
> > > > > >                         wb = locked_inode_to_wb_and_lock_list(inode);
> > > >
> > > > I'm a little confused about here. The logic of the current source tree is
> > > > like this:
> > > >                        if (!was_dirty) {
> > > >                                struct bdi_writeback *wb;
> > > >                                wb =
> > > > locked_inode_to_wb_and_lock_list(inode);
> > > >                                ...
> > > >                                dirty_list = &wb-> b_dirty_time;
> > > >                                assert_spin_locked(&wb->list_lock);
> > > >                        }
> > > > The logic is the opposite of the logic in the comments, and it seems like
> > > > that wb will
> > > > absolutely not be NULL.
> > > > Why is this? What is the difference between them?
> > >
> > > Sorry, that was a typo in my suggestion. It should have been
> > >
> > >                  if (!was_dirty)
> > >                          wb = locked_inode_to_wb_and_lock_list(inode);
> 
> 1. I have noticed that move_expired_inodes() has the logic as follows:
> 
>                       list_move(&inode->i_io_list, &tmp);
>                       spin_lock(&inode->i_lock);
>                       inode->i_state |= I_SYNC_QUEUED;
>                       spin_unlock(&inode->i_lock);
>                       ...
>                       list_move(&inode->i_io_list, dispatch_queue);
> 
>    Neither of the two operations on i_io_list are protected with
> inode->i_lock. It looks like that
>    do this on purpose, I'm a little confused about this.
>    I wonder that is this a mistake. or did this on purpose and there
> is something I have missed?
>    If the later, why is that?

Yes, that looks like a bug but a harmless one. Actually looking into the
code I'm not sure we still need the protection of inode->i_lock for
inode->i_io_list handling but that would be a separate cleanup anyway.

> 2. I also have some doubts about the results of testing for xfs with
> xfstests.I'll describe my test
>     steps later. The kernel version used for testing is
> 5.18.0-rc5-00016-g107c948d1d3e-dirty,
>     and the latest commit is 107c948d1d3e ("Merge tag 'seccomp-v5.18-rc6' of
>      git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux")
> 
>    <a> First tested without this patch, there are always a few fixed cases
>           where it will fail, maybe I got something wrong.  Here is
> the testing result.
>           Failures: xfs/078 xfs/191-input-validation xfs/252 xfs/289
> xfs/293 xfs/514
>                                    xfs/515 xfs/544
>           Failed 8 of 304 tests

This is OK. Usually failures like these are due to older version of
xfsprogs, unexpected configuration of the kernel, some missing tool or
something like that. Anyway you should be mostly interested in
not introducing new test failures :).

>    <b> Then tested with the patch which applied your suggestions. The
> result is unstable.
>            There is a high probability that there will be more
> failures(which will report as follows),
>            and a small probability that the test result is the same as
> the above test which without
>            this patch.
> 
>            xfs/206 ... umount: /mnt/test: target is busy.
>            _check_xfs_filesystem: filesystem on /dev/loop0 has dirty log
>            (see /root/xfstests-dev/results/xfs/206.full for details)
>             _check_xfs_filesystem: filesystem on /dev/loop0 is inconsistent(r)
>             (see /root/xfstests-dev/results/xfs/206.full for details)
>             ...
>            Failures: xfs/078 xfs/149 xfs/164 xfs/165
> xfs/191-input-validation xfs/206 xfs/222
>                           xfs/242 xfs/250 xfs/252 xfs/259 xfs/260
> xfs/289 xfs/290 xfs/292 xfs/293
>                           xfs/514 xfs/515 xfs/544
>            Failed 19 of 304 tests.

So this is definitely suspicious. Likely the patch introduced a problem.
You need to have a look why e.g. test xfs/149 fails (you can run individual
test like "./check xfs/149"). You can inspect output the test generates,
also kernel logs if there's anything suspicious etc.

>            I saw that there is a "fatal error: couldn't initialize XFS
> library" which means xfs_repair
>            have failed.

Yeah, you can maybe strace xfs_repair why this fails...

>    <c>  Lastly tested with the patch which applied your suggestions
> and some modifications
>            which made operations on i_io_list in move_expired_inodes()
> will be protected by
>            inode->i_lock. There is a high probability that  the result
> is the same as the test in <a>,
>            and a small probability the same as <b>
> 
>    I think I must be missing something I don't understand yet, do I?

Well, it may be that there is just some race somewhere so the problem does
not always manifest.

> 3.   Here are my test steps.
>             xfs_io -f -c "falloc 0 10g" test.img
>             mkfs.xfs test.img
>             losetup /dev/loop0 ./test.img
>             mount /dev/loop0 /mnt/test
>             export TEST_DEV=/dev/loop0
>             export TEST_DIR=/mnt/test
>             ./check -g xfs/quick
>             reboot after the tests(If don't reboot, following test
> results will become more unstable
>             and have more failures)
> 
>             Repeat the above steps.

This looks OK.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR