Hi Ryusuke, Thank you for reviewing my patch. I will rerun my tests to make sure my conclusions were correct. The log output suggests to me, that you are hitting a totally unrelated bug. I have seen that message before on one of my experimental branches, but I didn't think it could happen with stock NILFS2. My solution was to add the following piece of code to nilfs_vdesc_is_live() in nilfs-utils: if (vdesc->vd_period.p_end == vdesc->vd_cno) { /* * This block was overwritten in the same logical segment, but * in a different partial segment. Probably because of * fdatasync() or a flush to disk. * Without this check, gc will cause buffer confliction error * if both partial segments are cleaned at the same time. * In that case there will be two vdesc with the same ino, * cno and offset. */ return 0; } I can't tell for sure if that will fix your problem. I am just guessing. It isn't necessary with my setup, to reproduce the race condition bug. I am rerunning my tests now to make sure I didn't make a mistake. Best regards, Andreas Am Sun, 29 Oct 2017 14:09:40 +0900 schrieb Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: > Unfortunately the patch didn't fix this issue: > > [ 612.614615] NILFS (sda1): segctord starting. Construction interval > = 5 seconds, CP frequency < 30 seconds > [ 4298.841376] NILFS (sda1): nilfs_ioctl_move_inode_block: conflicting > data buffer: ino=1155, cno=11, offset=1, blocknr=1216036, > vblocknr=1198436 > [ 4298.841877] NILFS (sda1): error -17 preparing GC: cannot read > source blocks > > Look like the bugsimulator hit other bug. > > When this happened, the disk usage was 100%: > > $ df /dev/sda1 > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sda1 104849404 99655676 0 100% /test > > And, the error killed cleanerd. However, I could cleanly unmount the > partition, and > I didn't see any errors after remount the partition. > > As you commented, I have changed some parameters > in /etc/nilfs_cleanerd.conf: > ------------------------------------------- mc_nsegments_per_clean 16 > nsegments_per_clean 16 > max_clean_segments 30% > cleaning_interval 0.1 > mc_cleaning_interval 0.1 > ------------------------------------------- > > I will look into this issue with bugsimulator. > > Anyone can reproduce the original corruption issue in other way? > > Regards, > Ryusuke Konishi > > > 2017-10-29 2:16 GMT+09:00 Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: > > Hi, > > > > Sorry for my late reponse. > > I could reproduce a corruption issue with the bug simulator. > > > > ----- > > [88805.039545] NILFS (sda1): segctord starting. Construction > > interval = 5 second s, CP frequency < 30 seconds > > [92817.941644] NILFS (sda1): nilfs_ioctl_move_inode_block: > > conflicting data buff er: ino=10846, cno=92, offset=2, > > blocknr=11425709, vblocknr=11227714 [92817.941936] NILFS (sda1): > > error -17 preparing GC: cannot read source blocks ----- > > > > Now I'm testing the patch. I will send it to upstream after it > > turned out to suppress the issue. > > > > The patch itself looks good to me though I guess there may be > > similar bugs. What makes things complicated seems that inode block > > is not marked as dirty just when "propagate" function carries a > > dirty state from a dirty node/data block. > > > > Anyway, I hasten to apply this patch. > > > > Thanks, > > Ryusuke Konishi > > > > > > 2017-07-13 15:44 GMT+09:00 Andreas Rohner > > <andreas.rohner@xxxxxxx>: > >> Hi, > >> > >> Am Wed, 12 Jul 2017 09:08:57 +0900 > >> schrieb Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: > >> > >>> Hi, > >>> > >>> 2017-07-11 15:19 GMT+09:00 Andreas Rohner > >>> <andreas.rohner@xxxxxxx>: > >>> > > >>> > There is a race condition between the function > >>> > nilfs_dirty_inode() and nilfs_set_file_dirty(). > >>> > > >>> > When a file is opened, nilfs_dirty_inode() is called to update > >>> > the access timestamp in the inode. It calls > >>> > __nilfs_mark_inode_dirty() in a separate transaction. > >>> > __nilfs_mark_inode_dirty() caches the ifile buffer_head in the > >>> > i_bh field of the inode info structure and marks it as dirty. > >>> > > >>> > After some data was written to the file in another transaction, > >>> > the function nilfs_set_file_dirty() is called, which adds the > >>> > inode to the ns_dirty_files list. > >>> > > >>> > Then the segment construction calls > >>> > nilfs_segctor_collect_dirty_files(), which goes through the > >>> > ns_dirty_files list and checks the i_bh field. If there is a > >>> > cached buffer_head in i_bh it is not marked as dirty again. > >>> > > >>> > Since nilfs_dirty_inode() and nilfs_set_file_dirty() use > >>> > separate transactions, it is possible that a segment > >>> > construction that writes out the ifile occurs in-between the > >>> > two. If this happens the inode is not on the ns_dirty_files > >>> > list, but its ifile block is still marked as dirty and written > >>> > out. > >>> > > >>> > In the next segment construction, the data for the file is > >>> > written out and nilfs_bmap_propagate() updates the b-tree. > >>> > Eventually the bmap root is written into the i_bh block, which > >>> > is not dirty, because it was written out in another segment > >>> > construction. > >>> > > >>> > As a result the bmap update can be lost, which leads to file > >>> > system corruption. Either the virtual block address points to an > >>> > unallocated DAT block, or the DAT entry will be reused for > >>> > something different. > >>> > > >>> > The error can remain undetected for a long time. A typical error > >>> > message would be one of the "bad btree" errors or a warning > >>> > that a DAT entry could not be found. > >>> > > >>> > This bug can be reproduced reliably by a simple benchmark that > >>> > creates and overwrites millions of 4k files. > >>> > > >>> > Signed-off-by: Andreas Rohner <andreas.rohner@xxxxxxx> > >>> > >>> Will review, please wait for a while. > >> > >> Thank you, I have uploaded my benchmark on GitHub: > >> > >> https://github.com/zeitgeist87/bugsimulator > >> > >> The benchmark is very simple. It writes out 13107200 4k files in a > >> two layer directory tree, so that every directory contains about > >> 1000 files. Then it overwrites these files in an infinite loop. > >> > >> You need quite aggressive cleaner settings for it to keep up with > >> the bugsimulator: > >> > >> mc_nsegments_per_clean 16 > >> nsegments_per_clean 16 > >> max_clean_segments 30% > >> cleaning_interval 0.1 > >> mc_cleaning_interval 0.1 > >> > >> If you run this benchmark on a fresh 100GB NILFS2 partition with > >> the above cleaner settings, the file system will be corrupt in > >> about half an hour. At least on my machine. > >> > >> With the patch for the race condition, it can run indefinetly. > >> > >> Regards, > >> Andreas > >> > >>> Regards, > >>> Ryusuke Konishi > >>> > >>> > > >>> > --- > >>> > fs/nilfs2/segment.c | 6 ++++-- > >>> > 1 file changed, 4 insertions(+), 2 deletions(-) > >>> > > >>> > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c > >>> > index 70ded52dc1dd..50e12956c737 100644 > >>> > --- a/fs/nilfs2/segment.c > >>> > +++ b/fs/nilfs2/segment.c > >>> > @@ -1958,8 +1958,6 @@ static int > >>> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, > >>> > err, ii->vfs_inode.i_ino); return err; > >>> > } > >>> > - mark_buffer_dirty(ibh); > >>> > - nilfs_mdt_mark_dirty(ifile); > >>> > spin_lock(&nilfs->ns_inode_lock); > >>> > if (likely(!ii->i_bh)) > >>> > ii->i_bh = ibh; > >>> > @@ -1968,6 +1966,10 @@ static int > >>> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, > >>> > goto retry; } > >>> > > >>> > + // Always redirty the buffer to avoid race > >>> > condition > >>> > + mark_buffer_dirty(ii->i_bh); > >>> > + nilfs_mdt_mark_dirty(ifile); > >>> > + > >>> > clear_bit(NILFS_I_QUEUED, &ii->i_state); > >>> > set_bit(NILFS_I_BUSY, &ii->i_state); > >>> > list_move_tail(&ii->i_dirty, > >>> > &sci->sc_dirty_files); -- > >>> > 2.13.2 > >>> > > >>> > -- > >>> > To unsubscribe from this list: send the line "unsubscribe > >>> > linux-nilfs" in the body of a message to > >>> > majordomo@xxxxxxxxxxxxxxx More majordomo info at > >>> > http://vger.kernel.org/majordomo-info.html > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe > >>> linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx > >>> More majordomo info at > >>> http://vger.kernel.org/majordomo-info.html > >> > -- > To unsubscribe from this list: send the line "unsubscribe > linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html