On 2022/11/9 1:50, Ryusuke Konishi wrote: > Hi Liu Shixin, > > I'm sorry for my repeated emails. > > On Wed, Nov 9, 2022 at 12:13 AM Ryusuke Konishi wrote: >>> I think the most likely cause is metadata corruption. If so, we >>> should fix it by adding a proper sanity check, yes. >>> However, there is still the possibility that the error retry logic >>> after detecting errors has a flaw. (I believe at least this is not a >>> problem with normal paths.) >>> The sufile state inconsistency above is hypothetical for now. >>> Either way, I'd like to make sure what's actually happening (and where >>> the anomaly is coming from) so we can decide how to fix it. >> I noticed that this syzbot report has a disk image "mount_0.gz", so I >> tried to mount it read-only. >> The result was as follows: >> >> $ sudo mount -t nilfs2 -r ./mount_0 /mnt/test >> $ lssu >> SEGNUM DATE TIME STAT NBLOCKS >> 0 26760730-10-29 19:40:00 -de 527958016 >> 1 26760730-11-01 20:29:04 -de 527958016 >> 2 1176433641-11-01 02:01:52 -de 2983038235 >> 3 -1158249729-11-01 04:57:19 a-- 25375 >> 4 1970-01-02 21:24:32 a-- 0 >> 5 -1415215929-01-02 00:19:15 --e 1631451365 >> 6 841067190-01-02 13:02:59 -d- 3101461260 >> 7 1495233207-01-02 01:31:11 -de 1697748441 >> 8 1875753393-01-02 21:54:14 -de 337757600 >> 9 648878284-01-02 21:06:08 --- 2133388152 >> 10 2122778986-01-02 17:49:43 --e 874605800 >> 11 55846197-01-02 09:50:53 -de 4262365368 >> 12 1872396026-01-02 06:45:18 --- 1051768258 >> 13 820920473-01-02 19:28:35 --e 2932336675 >> 14 2128306755-01-02 10:17:45 --- 3568501216 >> 15 1457063063-01-02 01:24:17 --e 2330511560 >> 16 586864775-01-02 16:08:15 --- 355283425 >> 17 -824870041-01-02 22:47:26 -d- 4177999057 >> 18 1562176264-01-02 08:06:45 --- 1312597355 >> 19 -392925420-01-02 17:08:27 -d- 3811075948 >> 20 1927575458-01-02 21:26:51 -de 1384562525 >> 21 2139923505-01-02 08:16:04 -d- 41861305 >> >> Here, we can see that neither segment #3 (= ns_segnum) nor #4 (= >> ns_nextnum) has the dirty flag >> ("d" in STAT). So, as expected, this seems to be the root cause of >> the duplicate assignments and oops. >> The proper correction would be, therefore, to check and reject (or >> fix) this anomaly while outputting an error message >> (or warning if fix it at the same time). >> It should be inserted in mount time logic because the segments that >> nilfs2 itself allocates are always marked dirty with >> nilfs_sufile_alloc(). > I will change my opinion. > > Considering the possibility of sufile corruption at runtime, it seems > that the sanity check should be done on every nilfs_sufile_alloc() > call. > > I now think nilfs_sufile_alloc() should call nilfs_error() and return > -EIO if the number of a newly found vacant segment matches > nilfs->ns_segnum or nilfs->ns_nextnum. Before we add the first segbuf into sci->sc_segbufs in nilfs_segctor_begin_construction(), there is no possibility existing duplicate segnum. And the subsequent process should not be affected by sufile corruption. So I think it's enough to only check for case alloc==0. I don't find any other possible wrong scenarios. diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 7be632c15f91..7b91c790b798 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1326,7 +1326,13 @@ static int nilfs_segctor_begin_construction(struct nilfs_sc_info *sci, err = nilfs_sufile_alloc(nilfs->ns_sufile, &nextnum); if (err) goto failed; + } else { + /* Check the next segment has alreadly been allocated */ + err = nilfs_sufile_test_allocated(nilfs->ns_sufile, nextnum); + if (err) + goto failed; } + nilfs_segbuf_set_next_segnum(segbuf, nextnum, nilfs); BUG_ON(!list_empty(&sci->sc_segbufs)); diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c index 853a8212114f..8dff12c56891 100644 --- a/fs/nilfs2/sufile.c +++ b/fs/nilfs2/sufile.c @@ -399,6 +399,36 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64 *segnump) return ret; } +int nilfs_sufile_test_allocated(struct inode *sufile, __u64 *segnump) +{ + struct buffer_head *su_bh; + struct nilfs_segment_usage *su; + void *kaddr; + int ret; + + down_write(&NILFS_MDT(sufile)->mi_sem); + + ret = nilfs_sufile_get_segment_usage_block(sufile, segnump, 1, + &su_bh); + if (ret < 0) + goto out_sem; + + kaddr = kmap_atomic(su_bh->b_page); + su = nilfs_sufile_block_get_segment_usage( + sufile, segnump, su_bh, kaddr); + + if (nilfs_segment_usage_clean(su)) + ret = -EIO; + + kunmap_atomic(kaddr); + + brelse(su_bh); + +out_sem: + up_write(&NILFS_MDT(sufile)->mi_sem); + return ret; +} + void nilfs_sufile_do_cancel_free(struct inode *sufile, __u64 segnum, struct buffer_head *header_bh, struct buffer_head *su_bh) diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h index 8e8a1a5a0402..02b61ca6f318 100644 --- a/fs/nilfs2/sufile.h +++ b/fs/nilfs2/sufile.h @@ -24,6 +24,7 @@ unsigned long nilfs_sufile_get_ncleansegs(struct inode *sufile); int nilfs_sufile_set_alloc_range(struct inode *sufile, __u64 start, __u64 end); int nilfs_sufile_alloc(struct inode *, __u64 *); +int nilfs_sufile_test_allocated(struct inode *, __u64 *); int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum); int nilfs_sufile_set_segment_usage(struct inode *sufile, __u64 segnum, unsigned long nblocks, time64_t modtime); > > This test should not need to be done for every segbuf's sb_segnum > because metadata blocks that contain the information about allocated > segments are kept in memory because of its dirty state and will not be > destroyed until they are finally written out. > > One correction then. Just checking the output of the lssu command > wasn't enough because the following test is done on all flags except > the active flag "a". (this flag is not persistent and visible only > when seeing via ioctl.) > >> if (!nilfs_segment_usage_clean(su)) >> continue; >> /* found a clean segment */ > We also had to see the invisible flags as well to confirm if that is > the root cause, but lssu doens't show them all. So I checked the dump > data of the mount_0 file. > As a result, the segment at ns_segnum was not clean, but that of > ns_nextnum was clean, which means it's the root cause as expected. Thanks for your confirmation. Regards, Liu Shixin . > > Regards, > Ryusuke Konishi > . >