Am Sun, 29 Oct 2017 21:53:38 +0900 schrieb Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: > 2017-10-29 17:07 GMT+09:00 Andreas Rohner <andreas.rohner@xxxxxxx>: > > Hi Ryusuke, > > > > Thank you for reviewing my patch. I will rerun my tests to make sure > > my conclusions were correct. > > > > The log output suggests to me, that you are hitting a totally > > unrelated bug. I have seen that message before on one of my > > experimental branches, but I didn't think it could happen with > > stock NILFS2. > > > > My solution was to add the following piece of code to > > nilfs_vdesc_is_live() in nilfs-utils: > > > > if (vdesc->vd_period.p_end == vdesc->vd_cno) { > > /* > > * This block was overwritten in the same logical segment, > > but > > * in a different partial segment. Probably because of > > * fdatasync() or a flush to disk. > > * Without this check, gc will cause buffer confliction > > error > > * if both partial segments are cleaned at the same time. > > * In that case there will be two vdesc with the same ino, > > * cno and offset. > > */ > > return 0; > > } > > > > I can't tell for sure if that will fix your problem. I am just > > guessing. It isn't necessary with my setup, to reproduce the race > > condition bug. > > Thank you, Andreas. This snippet looks to have fixed the error so > far. > > Could you shape it to a patch (preferably a separate one) for > nilfs-utils repo ? Of course, but I am not sure how this error can occur exactly. I tried to reproduce it with a small shell script using fdatasync(), but I wasn't able to. However, I am pretty sure, that the patch does no harm and is safe. > I haven't yet succeeded to reproduce the original fs corruption error. > Will continue to try reproducing the bug waiting for report from you > and/or Clemens. I think I forgot to mention, that I use a protection period of 30 seconds. Here is my full nilfs_cleanerd.conf: protection_period 30 min_clean_segments 10% max_clean_segments 30% clean_check_interval 10 nsegments_per_clean 16 mc_nsegments_per_clean 16 cleaning_interval 0.1 mc_cleaning_interval 0.1 retry_interval 60 min_reclaimable_blocks 10% mc_min_reclaimable_blocks 1% use_set_suinfo use_mmap log_priority info I can reproduce the corruption error with the latest kernel 4.14.0, and I can confirm that it is fixed after applying the patch. I used an older six core AMD Phenom(tm) II X6 1090T Processor and a Samsung 850 EVO SSD. Best regards, Andreas > Thanks, > Ryusuke Konishi > > > > > I am rerunning my tests now to make sure I didn't make a mistake. > > > > Best regards, > > > > Andreas > > > > Am Sun, 29 Oct 2017 14:09:40 +0900 > > schrieb Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: > > > >> Unfortunately the patch didn't fix this issue: > >> > >> [ 612.614615] NILFS (sda1): segctord starting. Construction > >> interval = 5 seconds, CP frequency < 30 seconds > >> [ 4298.841376] NILFS (sda1): nilfs_ioctl_move_inode_block: > >> conflicting data buffer: ino=1155, cno=11, offset=1, > >> blocknr=1216036, vblocknr=1198436 > >> [ 4298.841877] NILFS (sda1): error -17 preparing GC: cannot read > >> source blocks > >> > >> Look like the bugsimulator hit other bug. > >> > >> When this happened, the disk usage was 100%: > >> > >> $ df /dev/sda1 > >> Filesystem 1K-blocks Used Available Use% Mounted on > >> /dev/sda1 104849404 99655676 0 100% /test > >> > >> And, the error killed cleanerd. However, I could cleanly unmount > >> the partition, and > >> I didn't see any errors after remount the partition. > >> > >> As you commented, I have changed some parameters > >> in /etc/nilfs_cleanerd.conf: > >> ------------------------------------------- > >> mc_nsegments_per_clean 16 nsegments_per_clean 16 > >> max_clean_segments 30% > >> cleaning_interval 0.1 > >> mc_cleaning_interval 0.1 > >> ------------------------------------------- > >> > >> I will look into this issue with bugsimulator. > >> > >> Anyone can reproduce the original corruption issue in other way? > >> > >> Regards, > >> Ryusuke Konishi > >> > >> > >> 2017-10-29 2:16 GMT+09:00 Ryusuke Konishi > >> <konishi.ryusuke@xxxxxxxxx>: > >> > Hi, > >> > > >> > Sorry for my late reponse. > >> > I could reproduce a corruption issue with the bug simulator. > >> > > >> > ----- > >> > [88805.039545] NILFS (sda1): segctord starting. Construction > >> > interval = 5 second s, CP frequency < 30 seconds > >> > [92817.941644] NILFS (sda1): nilfs_ioctl_move_inode_block: > >> > conflicting data buff er: ino=10846, cno=92, offset=2, > >> > blocknr=11425709, vblocknr=11227714 [92817.941936] NILFS (sda1): > >> > error -17 preparing GC: cannot read source blocks ----- > >> > > >> > Now I'm testing the patch. I will send it to upstream after it > >> > turned out to suppress the issue. > >> > > >> > The patch itself looks good to me though I guess there may be > >> > similar bugs. What makes things complicated seems that inode > >> > block is not marked as dirty just when "propagate" function > >> > carries a dirty state from a dirty node/data block. > >> > > >> > Anyway, I hasten to apply this patch. > >> > > >> > Thanks, > >> > Ryusuke Konishi > >> > > >> > > >> > 2017-07-13 15:44 GMT+09:00 Andreas Rohner > >> > <andreas.rohner@xxxxxxx>: > >> >> Hi, > >> >> > >> >> Am Wed, 12 Jul 2017 09:08:57 +0900 > >> >> schrieb Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: > >> >> > >> >>> Hi, > >> >>> > >> >>> 2017-07-11 15:19 GMT+09:00 Andreas Rohner > >> >>> <andreas.rohner@xxxxxxx>: > >> >>> > > >> >>> > There is a race condition between the function > >> >>> > nilfs_dirty_inode() and nilfs_set_file_dirty(). > >> >>> > > >> >>> > When a file is opened, nilfs_dirty_inode() is called to > >> >>> > update the access timestamp in the inode. It calls > >> >>> > __nilfs_mark_inode_dirty() in a separate transaction. > >> >>> > __nilfs_mark_inode_dirty() caches the ifile buffer_head in > >> >>> > the i_bh field of the inode info structure and marks it as > >> >>> > dirty. > >> >>> > > >> >>> > After some data was written to the file in another > >> >>> > transaction, the function nilfs_set_file_dirty() is called, > >> >>> > which adds the inode to the ns_dirty_files list. > >> >>> > > >> >>> > Then the segment construction calls > >> >>> > nilfs_segctor_collect_dirty_files(), which goes through the > >> >>> > ns_dirty_files list and checks the i_bh field. If there is a > >> >>> > cached buffer_head in i_bh it is not marked as dirty again. > >> >>> > > >> >>> > Since nilfs_dirty_inode() and nilfs_set_file_dirty() use > >> >>> > separate transactions, it is possible that a segment > >> >>> > construction that writes out the ifile occurs in-between the > >> >>> > two. If this happens the inode is not on the ns_dirty_files > >> >>> > list, but its ifile block is still marked as dirty and > >> >>> > written out. > >> >>> > > >> >>> > In the next segment construction, the data for the file is > >> >>> > written out and nilfs_bmap_propagate() updates the b-tree. > >> >>> > Eventually the bmap root is written into the i_bh block, > >> >>> > which is not dirty, because it was written out in another > >> >>> > segment construction. > >> >>> > > >> >>> > As a result the bmap update can be lost, which leads to file > >> >>> > system corruption. Either the virtual block address points > >> >>> > to an unallocated DAT block, or the DAT entry will be reused > >> >>> > for something different. > >> >>> > > >> >>> > The error can remain undetected for a long time. A typical > >> >>> > error message would be one of the "bad btree" errors or a > >> >>> > warning that a DAT entry could not be found. > >> >>> > > >> >>> > This bug can be reproduced reliably by a simple benchmark > >> >>> > that creates and overwrites millions of 4k files. > >> >>> > > >> >>> > Signed-off-by: Andreas Rohner <andreas.rohner@xxxxxxx> > >> >>> > >> >>> Will review, please wait for a while. > >> >> > >> >> Thank you, I have uploaded my benchmark on GitHub: > >> >> > >> >> https://github.com/zeitgeist87/bugsimulator > >> >> > >> >> The benchmark is very simple. It writes out 13107200 4k files > >> >> in a two layer directory tree, so that every directory contains > >> >> about 1000 files. Then it overwrites these files in an infinite > >> >> loop. > >> >> > >> >> You need quite aggressive cleaner settings for it to keep up > >> >> with the bugsimulator: > >> >> > >> >> mc_nsegments_per_clean 16 > >> >> nsegments_per_clean 16 > >> >> max_clean_segments 30% > >> >> cleaning_interval 0.1 > >> >> mc_cleaning_interval 0.1 > >> >> > >> >> If you run this benchmark on a fresh 100GB NILFS2 partition with > >> >> the above cleaner settings, the file system will be corrupt in > >> >> about half an hour. At least on my machine. > >> >> > >> >> With the patch for the race condition, it can run indefinetly. > >> >> > >> >> Regards, > >> >> Andreas > >> >> > >> >>> Regards, > >> >>> Ryusuke Konishi > >> >>> > >> >>> > > >> >>> > --- > >> >>> > fs/nilfs2/segment.c | 6 ++++-- > >> >>> > 1 file changed, 4 insertions(+), 2 deletions(-) > >> >>> > > >> >>> > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c > >> >>> > index 70ded52dc1dd..50e12956c737 100644 > >> >>> > --- a/fs/nilfs2/segment.c > >> >>> > +++ b/fs/nilfs2/segment.c > >> >>> > @@ -1958,8 +1958,6 @@ static int > >> >>> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, > >> >>> > err, ii->vfs_inode.i_ino); return err; > >> >>> > } > >> >>> > - mark_buffer_dirty(ibh); > >> >>> > - nilfs_mdt_mark_dirty(ifile); > >> >>> > spin_lock(&nilfs->ns_inode_lock); > >> >>> > if (likely(!ii->i_bh)) > >> >>> > ii->i_bh = ibh; > >> >>> > @@ -1968,6 +1966,10 @@ static int > >> >>> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, > >> >>> > goto retry; } > >> >>> > > >> >>> > + // Always redirty the buffer to avoid race > >> >>> > condition > >> >>> > + mark_buffer_dirty(ii->i_bh); > >> >>> > + nilfs_mdt_mark_dirty(ifile); > >> >>> > + > >> >>> > clear_bit(NILFS_I_QUEUED, &ii->i_state); > >> >>> > set_bit(NILFS_I_BUSY, &ii->i_state); > >> >>> > list_move_tail(&ii->i_dirty, > >> >>> > &sci->sc_dirty_files); -- > >> >>> > 2.13.2 > >> >>> > > >> >>> > -- > >> >>> > To unsubscribe from this list: send the line "unsubscribe > >> >>> > linux-nilfs" in the body of a message to > >> >>> > majordomo@xxxxxxxxxxxxxxx More majordomo info at > >> >>> > http://vger.kernel.org/majordomo-info.html > >> >>> -- > >> >>> To unsubscribe from this list: send the line "unsubscribe > >> >>> linux-nilfs" in the body of a message to > >> >>> majordomo@xxxxxxxxxxxxxxx More majordomo info at > >> >>> http://vger.kernel.org/majordomo-info.html > >> >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe > >> linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at > >> http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe > linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html