Hi, Sorry for my late reponse. I could reproduce a corruption issue with the bug simulator. ----- [88805.039545] NILFS (sda1): segctord starting. Construction interval = 5 second s, CP frequency < 30 seconds [92817.941644] NILFS (sda1): nilfs_ioctl_move_inode_block: conflicting data buff er: ino=10846, cno=92, offset=2, blocknr=11425709, vblocknr=11227714 [92817.941936] NILFS (sda1): error -17 preparing GC: cannot read source blocks ----- Now I'm testing the patch. I will send it to upstream after it turned out to suppress the issue. The patch itself looks good to me though I guess there may be similar bugs. What makes things complicated seems that inode block is not marked as dirty just when "propagate" function carries a dirty state from a dirty node/data block. Anyway, I hasten to apply this patch. Thanks, Ryusuke Konishi 2017-07-13 15:44 GMT+09:00 Andreas Rohner <andreas.rohner@xxxxxxx>: > Hi, > > Am Wed, 12 Jul 2017 09:08:57 +0900 > schrieb Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: > >> Hi, >> >> 2017-07-11 15:19 GMT+09:00 Andreas Rohner <andreas.rohner@xxxxxxx>: >> > >> > There is a race condition between the function nilfs_dirty_inode() >> > and nilfs_set_file_dirty(). >> > >> > When a file is opened, nilfs_dirty_inode() is called to update the >> > access timestamp in the inode. It calls __nilfs_mark_inode_dirty() >> > in a separate transaction. __nilfs_mark_inode_dirty() caches the >> > ifile buffer_head in the i_bh field of the inode info structure and >> > marks it as dirty. >> > >> > After some data was written to the file in another transaction, the >> > function nilfs_set_file_dirty() is called, which adds the inode to >> > the ns_dirty_files list. >> > >> > Then the segment construction calls >> > nilfs_segctor_collect_dirty_files(), which goes through the >> > ns_dirty_files list and checks the i_bh field. If there is a cached >> > buffer_head in i_bh it is not marked as dirty again. >> > >> > Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate >> > transactions, it is possible that a segment construction that >> > writes out the ifile occurs in-between the two. If this happens the >> > inode is not on the ns_dirty_files list, but its ifile block is >> > still marked as dirty and written out. >> > >> > In the next segment construction, the data for the file is written >> > out and nilfs_bmap_propagate() updates the b-tree. Eventually the >> > bmap root is written into the i_bh block, which is not dirty, >> > because it was written out in another segment construction. >> > >> > As a result the bmap update can be lost, which leads to file system >> > corruption. Either the virtual block address points to an >> > unallocated DAT block, or the DAT entry will be reused for >> > something different. >> > >> > The error can remain undetected for a long time. A typical error >> > message would be one of the "bad btree" errors or a warning that a >> > DAT entry could not be found. >> > >> > This bug can be reproduced reliably by a simple benchmark that >> > creates and overwrites millions of 4k files. >> > >> > Signed-off-by: Andreas Rohner <andreas.rohner@xxxxxxx> >> >> Will review, please wait for a while. > > Thank you, I have uploaded my benchmark on GitHub: > > https://github.com/zeitgeist87/bugsimulator > > The benchmark is very simple. It writes out 13107200 4k files in a two > layer directory tree, so that every directory contains about 1000 files. > Then it overwrites these files in an infinite loop. > > You need quite aggressive cleaner settings for it to keep up with the > bugsimulator: > > mc_nsegments_per_clean 16 > nsegments_per_clean 16 > max_clean_segments 30% > cleaning_interval 0.1 > mc_cleaning_interval 0.1 > > If you run this benchmark on a fresh 100GB NILFS2 partition with the > above cleaner settings, the file system will be corrupt in about half an > hour. At least on my machine. > > With the patch for the race condition, it can run indefinetly. > > Regards, > Andreas > >> Regards, >> Ryusuke Konishi >> >> > >> > --- >> > fs/nilfs2/segment.c | 6 ++++-- >> > 1 file changed, 4 insertions(+), 2 deletions(-) >> > >> > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c >> > index 70ded52dc1dd..50e12956c737 100644 >> > --- a/fs/nilfs2/segment.c >> > +++ b/fs/nilfs2/segment.c >> > @@ -1958,8 +1958,6 @@ static int >> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, err, >> > ii->vfs_inode.i_ino); return err; >> > } >> > - mark_buffer_dirty(ibh); >> > - nilfs_mdt_mark_dirty(ifile); >> > spin_lock(&nilfs->ns_inode_lock); >> > if (likely(!ii->i_bh)) >> > ii->i_bh = ibh; >> > @@ -1968,6 +1966,10 @@ static int >> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, goto >> > retry; } >> > >> > + // Always redirty the buffer to avoid race condition >> > + mark_buffer_dirty(ii->i_bh); >> > + nilfs_mdt_mark_dirty(ifile); >> > + >> > clear_bit(NILFS_I_QUEUED, &ii->i_state); >> > set_bit(NILFS_I_BUSY, &ii->i_state); >> > list_move_tail(&ii->i_dirty, &sci->sc_dirty_files); >> > -- >> > 2.13.2 >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe >> > linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe >> linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html