Re: [PATCH] nilfs2: fix NULL pointer dereference in nilfs_segctor_prepare_write()

Liu Shixin <liushixin2@xxxxxxxxxx> · Wed, 9 Nov 2022 11:26:30 +0800

On 2022/11/9 1:50, Ryusuke Konishi wrote:
> Hi Liu Shixin,
>
> I'm sorry for my repeated emails.
>
> On Wed, Nov 9, 2022 at 12:13 AM Ryusuke Konishi wrote:
>>> I think the most likely cause is metadata corruption.  If so, we
>>> should fix it by adding a proper sanity check, yes.
>>> However, there is still the possibility that the error retry logic
>>> after detecting errors has a flaw. (I believe at least this is not a
>>> problem with normal paths.)
>>> The sufile state inconsistency above is hypothetical for now.
>>> Either way, I'd like to make sure what's actually happening (and where
>>> the anomaly is coming from) so we can decide how to fix it.
>> I noticed that this syzbot report has a disk image "mount_0.gz", so I
>> tried to mount it read-only.
>> The result was as follows:
>>
>> $ sudo mount -t nilfs2 -r ./mount_0 /mnt/test
>> $ lssu
>>               SEGNUM        DATE     TIME STAT     NBLOCKS
>>                    0  26760730-10-29 19:40:00  -de   527958016
>>                    1  26760730-11-01 20:29:04  -de   527958016
>>                    2  1176433641-11-01 02:01:52  -de  2983038235
>>                    3  -1158249729-11-01 04:57:19  a--       25375
>>                    4  1970-01-02 21:24:32  a--           0
>>                    5  -1415215929-01-02 00:19:15  --e  1631451365
>>                    6  841067190-01-02 13:02:59  -d-  3101461260
>>                    7  1495233207-01-02 01:31:11  -de  1697748441
>>                    8  1875753393-01-02 21:54:14  -de   337757600
>>                    9  648878284-01-02 21:06:08  ---  2133388152
>>                   10  2122778986-01-02 17:49:43  --e   874605800
>>                   11  55846197-01-02 09:50:53  -de  4262365368
>>                   12  1872396026-01-02 06:45:18  ---  1051768258
>>                   13  820920473-01-02 19:28:35  --e  2932336675
>>                   14  2128306755-01-02 10:17:45  ---  3568501216
>>                   15  1457063063-01-02 01:24:17  --e  2330511560
>>                   16  586864775-01-02 16:08:15  ---   355283425
>>                   17  -824870041-01-02 22:47:26  -d-  4177999057
>>                   18  1562176264-01-02 08:06:45  ---  1312597355
>>                   19  -392925420-01-02 17:08:27  -d-  3811075948
>>                   20  1927575458-01-02 21:26:51  -de  1384562525
>>                   21  2139923505-01-02 08:16:04  -d-    41861305
>>
>> Here,  we can see that neither segment #3 (= ns_segnum) nor #4 (=
>> ns_nextnum) has the dirty flag
>> ("d" in STAT).   So, as expected, this seems to be the root cause of
>> the duplicate assignments and oops.
>> The proper correction would be, therefore, to check and reject (or
>> fix) this anomaly while outputting an error message
>> (or warning if fix it at the same time).
>> It should be inserted in mount time logic because the segments that
>> nilfs2 itself allocates are always marked dirty with
>> nilfs_sufile_alloc().
> I will change my opinion.
>
> Considering the possibility of sufile corruption at runtime, it seems
> that the sanity check should be done on every nilfs_sufile_alloc()
> call.
>
> I now think nilfs_sufile_alloc() should call nilfs_error() and return
> -EIO if the number of a newly found vacant segment matches
> nilfs->ns_segnum or nilfs->ns_nextnum.
Before we add the first segbuf into sci->sc_segbufs in nilfs_segctor_begin_construction(),
there is no possibility existing duplicate segnum. And the subsequent process should
not be affected by sufile corruption. So I think it's enough to only check for case alloc==0.
I don't find any other possible wrong scenarios.

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 7be632c15f91..7b91c790b798 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1326,7 +1326,13 @@ static int nilfs_segctor_begin_construction(struct nilfs_sc_info *sci,
                err = nilfs_sufile_alloc(nilfs->ns_sufile, &nextnum);
                if (err)
                        goto failed;
+       } else {
+               /* Check the next segment has alreadly been allocated */
+               err = nilfs_sufile_test_allocated(nilfs->ns_sufile, nextnum);
+               if (err)
+                       goto failed;
        }
+
        nilfs_segbuf_set_next_segnum(segbuf, nextnum, nilfs);
 
        BUG_ON(!list_empty(&sci->sc_segbufs));
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 853a8212114f..8dff12c56891 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -399,6 +399,36 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump)
        return ret;
 }
 
+int nilfs_sufile_test_allocated(struct inode *sufile, __u64 *segnump)
+{
+       struct buffer_head *su_bh;
+       struct nilfs_segment_usage *su;
+       void *kaddr;
+       int ret;
+
+       down_write(&NILFS_MDT(sufile)->mi_sem);
+
+       ret = nilfs_sufile_get_segment_usage_block(sufile, segnump, 1,
+                                                  &su_bh);
+       if (ret < 0)
+               goto out_sem;
+
+       kaddr = kmap_atomic(su_bh->b_page);
+       su = nilfs_sufile_block_get_segment_usage(
+               sufile, segnump, su_bh, kaddr);
+
+       if (nilfs_segment_usage_clean(su))
+               ret = -EIO;
+
+       kunmap_atomic(kaddr);
+
+       brelse(su_bh);
+
+out_sem:
+       up_write(&NILFS_MDT(sufile)->mi_sem);
+       return ret;
+}
+
 void nilfs_sufile_do_cancel_free(struct inode *sufile, __u64 segnum,
                                 struct buffer_head *header_bh,
                                 struct buffer_head *su_bh)
diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h
index 8e8a1a5a0402..02b61ca6f318 100644
--- a/fs/nilfs2/sufile.h
+++ b/fs/nilfs2/sufile.h
@@ -24,6 +24,7 @@ unsigned long nilfs_sufile_get_ncleansegs(struct inode *sufile);
 
 int nilfs_sufile_set_alloc_range(struct inode *sufile, __u64 start, __u64 end);
 int nilfs_sufile_alloc(struct inode *, __u64 *);
+int nilfs_sufile_test_allocated(struct inode *, __u64 *);
 int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum);
 int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum,
                                   unsigned long nblocks, time64_t modtime);

>
> This test should not need to be done for every segbuf's sb_segnum
> because metadata blocks that contain the information about allocated
> segments are kept in memory because of its dirty state and will not be
> destroyed until they are finally written out.
>
> One correction then.  Just checking the output of the lssu command
> wasn't enough because the following test is done on all flags except
> the active flag "a".  (this flag is not persistent and visible only
> when seeing via ioctl.)
>
>>                         if (!nilfs_segment_usage_clean(su))
>>                                 continue;
>>                         /* found a clean segment */
> We also had to see the invisible flags as well to confirm if that is
> the root cause, but lssu doens't show them all.  So I checked the dump
> data of the mount_0 file.
> As a result, the segment at ns_segnum was not clean, but that of
> ns_nextnum was clean, which means it's the root cause as expected.
Thanks for your confirmation.

Regards,
Liu Shixin
.
>
> Regards,
> Ryusuke Konishi
> .
>