Unfortunately the patch didn't fix this issue: [ 612.614615] NILFS (sda1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds [ 4298.841376] NILFS (sda1): nilfs_ioctl_move_inode_block: conflicting data buffer: ino=1155, cno=11, offset=1, blocknr=1216036, vblocknr=1198436 [ 4298.841877] NILFS (sda1): error -17 preparing GC: cannot read source blocks Look like the bugsimulator hit other bug. When this happened, the disk usage was 100%: $ df /dev/sda1 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 104849404 99655676 0 100% /test And, the error killed cleanerd. However, I could cleanly unmount the partition, and I didn't see any errors after remount the partition. As you commented, I have changed some parameters in /etc/nilfs_cleanerd.conf: ------------------------------------------- mc_nsegments_per_clean 16 nsegments_per_clean 16 max_clean_segments 30% cleaning_interval 0.1 mc_cleaning_interval 0.1 ------------------------------------------- I will look into this issue with bugsimulator. Anyone can reproduce the original corruption issue in other way? Regards, Ryusuke Konishi 2017-10-29 2:16 GMT+09:00 Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: > Hi, > > Sorry for my late reponse. > I could reproduce a corruption issue with the bug simulator. > > ----- > [88805.039545] NILFS (sda1): segctord starting. Construction interval = 5 second > s, CP frequency < 30 seconds > [92817.941644] NILFS (sda1): nilfs_ioctl_move_inode_block: conflicting data buff > er: ino=10846, cno=92, offset=2, blocknr=11425709, vblocknr=11227714 > [92817.941936] NILFS (sda1): error -17 preparing GC: cannot read source blocks > ----- > > Now I'm testing the patch. I will send it to upstream after it turned > out to suppress the issue. > > The patch itself looks good to me though I guess there may be similar bugs. > What makes things complicated seems that inode block is not marked as dirty > just when "propagate" function carries a dirty state from a dirty > node/data block. > > Anyway, I hasten to apply this patch. > > Thanks, > Ryusuke Konishi > > > 2017-07-13 15:44 GMT+09:00 Andreas Rohner <andreas.rohner@xxxxxxx>: >> Hi, >> >> Am Wed, 12 Jul 2017 09:08:57 +0900 >> schrieb Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: >> >>> Hi, >>> >>> 2017-07-11 15:19 GMT+09:00 Andreas Rohner <andreas.rohner@xxxxxxx>: >>> > >>> > There is a race condition between the function nilfs_dirty_inode() >>> > and nilfs_set_file_dirty(). >>> > >>> > When a file is opened, nilfs_dirty_inode() is called to update the >>> > access timestamp in the inode. It calls __nilfs_mark_inode_dirty() >>> > in a separate transaction. __nilfs_mark_inode_dirty() caches the >>> > ifile buffer_head in the i_bh field of the inode info structure and >>> > marks it as dirty. >>> > >>> > After some data was written to the file in another transaction, the >>> > function nilfs_set_file_dirty() is called, which adds the inode to >>> > the ns_dirty_files list. >>> > >>> > Then the segment construction calls >>> > nilfs_segctor_collect_dirty_files(), which goes through the >>> > ns_dirty_files list and checks the i_bh field. If there is a cached >>> > buffer_head in i_bh it is not marked as dirty again. >>> > >>> > Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate >>> > transactions, it is possible that a segment construction that >>> > writes out the ifile occurs in-between the two. If this happens the >>> > inode is not on the ns_dirty_files list, but its ifile block is >>> > still marked as dirty and written out. >>> > >>> > In the next segment construction, the data for the file is written >>> > out and nilfs_bmap_propagate() updates the b-tree. Eventually the >>> > bmap root is written into the i_bh block, which is not dirty, >>> > because it was written out in another segment construction. >>> > >>> > As a result the bmap update can be lost, which leads to file system >>> > corruption. Either the virtual block address points to an >>> > unallocated DAT block, or the DAT entry will be reused for >>> > something different. >>> > >>> > The error can remain undetected for a long time. A typical error >>> > message would be one of the "bad btree" errors or a warning that a >>> > DAT entry could not be found. >>> > >>> > This bug can be reproduced reliably by a simple benchmark that >>> > creates and overwrites millions of 4k files. >>> > >>> > Signed-off-by: Andreas Rohner <andreas.rohner@xxxxxxx> >>> >>> Will review, please wait for a while. >> >> Thank you, I have uploaded my benchmark on GitHub: >> >> https://github.com/zeitgeist87/bugsimulator >> >> The benchmark is very simple. It writes out 13107200 4k files in a two >> layer directory tree, so that every directory contains about 1000 files. >> Then it overwrites these files in an infinite loop. >> >> You need quite aggressive cleaner settings for it to keep up with the >> bugsimulator: >> >> mc_nsegments_per_clean 16 >> nsegments_per_clean 16 >> max_clean_segments 30% >> cleaning_interval 0.1 >> mc_cleaning_interval 0.1 >> >> If you run this benchmark on a fresh 100GB NILFS2 partition with the >> above cleaner settings, the file system will be corrupt in about half an >> hour. At least on my machine. >> >> With the patch for the race condition, it can run indefinetly. >> >> Regards, >> Andreas >> >>> Regards, >>> Ryusuke Konishi >>> >>> > >>> > --- >>> > fs/nilfs2/segment.c | 6 ++++-- >>> > 1 file changed, 4 insertions(+), 2 deletions(-) >>> > >>> > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c >>> > index 70ded52dc1dd..50e12956c737 100644 >>> > --- a/fs/nilfs2/segment.c >>> > +++ b/fs/nilfs2/segment.c >>> > @@ -1958,8 +1958,6 @@ static int >>> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, err, >>> > ii->vfs_inode.i_ino); return err; >>> > } >>> > - mark_buffer_dirty(ibh); >>> > - nilfs_mdt_mark_dirty(ifile); >>> > spin_lock(&nilfs->ns_inode_lock); >>> > if (likely(!ii->i_bh)) >>> > ii->i_bh = ibh; >>> > @@ -1968,6 +1966,10 @@ static int >>> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, goto >>> > retry; } >>> > >>> > + // Always redirty the buffer to avoid race condition >>> > + mark_buffer_dirty(ii->i_bh); >>> > + nilfs_mdt_mark_dirty(ifile); >>> > + >>> > clear_bit(NILFS_I_QUEUED, &ii->i_state); >>> > set_bit(NILFS_I_BUSY, &ii->i_state); >>> > list_move_tail(&ii->i_dirty, &sci->sc_dirty_files); >>> > -- >>> > 2.13.2 >>> > >>> > -- >>> > To unsubscribe from this list: send the line "unsubscribe >>> > linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx >>> > More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html