>> I haven't yet succeeded to reproduce the original fs corruption error. >> Will continue to try reproducing the bug waiting for report from you >> and/or Clemens. > > I think I forgot to mention, that I use a protection period of 30 > seconds. Here is my full nilfs_cleanerd.conf: Thanks. This is likely to affect the test. Regards, Ryusuke Konishi 2017-10-29 23:06 GMT+09:00 Andreas Rohner <andreas.rohner@xxxxxxx>: > Am Sun, 29 Oct 2017 21:53:38 +0900 > schrieb Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: > >> 2017-10-29 17:07 GMT+09:00 Andreas Rohner <andreas.rohner@xxxxxxx>: >> > Hi Ryusuke, >> > >> > Thank you for reviewing my patch. I will rerun my tests to make sure >> > my conclusions were correct. >> > >> > The log output suggests to me, that you are hitting a totally >> > unrelated bug. I have seen that message before on one of my >> > experimental branches, but I didn't think it could happen with >> > stock NILFS2. >> > >> > My solution was to add the following piece of code to >> > nilfs_vdesc_is_live() in nilfs-utils: >> > >> > if (vdesc->vd_period.p_end == vdesc->vd_cno) { >> > /* >> > * This block was overwritten in the same logical segment, >> > but >> > * in a different partial segment. Probably because of >> > * fdatasync() or a flush to disk. >> > * Without this check, gc will cause buffer confliction >> > error >> > * if both partial segments are cleaned at the same time. >> > * In that case there will be two vdesc with the same ino, >> > * cno and offset. >> > */ >> > return 0; >> > } >> > >> > I can't tell for sure if that will fix your problem. I am just >> > guessing. It isn't necessary with my setup, to reproduce the race >> > condition bug. >> >> Thank you, Andreas. This snippet looks to have fixed the error so >> far. >> >> Could you shape it to a patch (preferably a separate one) for >> nilfs-utils repo ? > > Of course, but I am not sure how this error can occur exactly. > I tried to reproduce it with a small shell script using > fdatasync(), but I wasn't able to. > > However, I am pretty sure, that the patch does no harm and is safe. > >> I haven't yet succeeded to reproduce the original fs corruption error. >> Will continue to try reproducing the bug waiting for report from you >> and/or Clemens. > > I think I forgot to mention, that I use a protection period of 30 > seconds. Here is my full nilfs_cleanerd.conf: > > protection_period 30 > min_clean_segments 10% > max_clean_segments 30% > clean_check_interval 10 > nsegments_per_clean 16 > mc_nsegments_per_clean 16 > cleaning_interval 0.1 > mc_cleaning_interval 0.1 > retry_interval 60 > min_reclaimable_blocks 10% > mc_min_reclaimable_blocks 1% > use_set_suinfo > use_mmap > log_priority info > > I can reproduce the corruption error with the latest kernel 4.14.0, and > I can confirm that it is fixed after applying the patch. > I used an older six core AMD Phenom(tm) II X6 1090T Processor and a > Samsung 850 EVO SSD. > > Best regards, > > Andreas > >> Thanks, >> Ryusuke Konishi >> >> > >> > I am rerunning my tests now to make sure I didn't make a mistake. >> > >> > Best regards, >> > >> > Andreas >> > >> > Am Sun, 29 Oct 2017 14:09:40 +0900 >> > schrieb Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: >> > >> >> Unfortunately the patch didn't fix this issue: >> >> >> >> [ 612.614615] NILFS (sda1): segctord starting. Construction >> >> interval = 5 seconds, CP frequency < 30 seconds >> >> [ 4298.841376] NILFS (sda1): nilfs_ioctl_move_inode_block: >> >> conflicting data buffer: ino=1155, cno=11, offset=1, >> >> blocknr=1216036, vblocknr=1198436 >> >> [ 4298.841877] NILFS (sda1): error -17 preparing GC: cannot read >> >> source blocks >> >> >> >> Look like the bugsimulator hit other bug. >> >> >> >> When this happened, the disk usage was 100%: >> >> >> >> $ df /dev/sda1 >> >> Filesystem 1K-blocks Used Available Use% Mounted on >> >> /dev/sda1 104849404 99655676 0 100% /test >> >> >> >> And, the error killed cleanerd. However, I could cleanly unmount >> >> the partition, and >> >> I didn't see any errors after remount the partition. >> >> >> >> As you commented, I have changed some parameters >> >> in /etc/nilfs_cleanerd.conf: >> >> ------------------------------------------- >> >> mc_nsegments_per_clean 16 nsegments_per_clean 16 >> >> max_clean_segments 30% >> >> cleaning_interval 0.1 >> >> mc_cleaning_interval 0.1 >> >> ------------------------------------------- >> >> >> >> I will look into this issue with bugsimulator. >> >> >> >> Anyone can reproduce the original corruption issue in other way? >> >> >> >> Regards, >> >> Ryusuke Konishi >> >> >> >> >> >> 2017-10-29 2:16 GMT+09:00 Ryusuke Konishi >> >> <konishi.ryusuke@xxxxxxxxx>: >> >> > Hi, >> >> > >> >> > Sorry for my late reponse. >> >> > I could reproduce a corruption issue with the bug simulator. >> >> > >> >> > ----- >> >> > [88805.039545] NILFS (sda1): segctord starting. Construction >> >> > interval = 5 second s, CP frequency < 30 seconds >> >> > [92817.941644] NILFS (sda1): nilfs_ioctl_move_inode_block: >> >> > conflicting data buff er: ino=10846, cno=92, offset=2, >> >> > blocknr=11425709, vblocknr=11227714 [92817.941936] NILFS (sda1): >> >> > error -17 preparing GC: cannot read source blocks ----- >> >> > >> >> > Now I'm testing the patch. I will send it to upstream after it >> >> > turned out to suppress the issue. >> >> > >> >> > The patch itself looks good to me though I guess there may be >> >> > similar bugs. What makes things complicated seems that inode >> >> > block is not marked as dirty just when "propagate" function >> >> > carries a dirty state from a dirty node/data block. >> >> > >> >> > Anyway, I hasten to apply this patch. >> >> > >> >> > Thanks, >> >> > Ryusuke Konishi >> >> > >> >> > >> >> > 2017-07-13 15:44 GMT+09:00 Andreas Rohner >> >> > <andreas.rohner@xxxxxxx>: >> >> >> Hi, >> >> >> >> >> >> Am Wed, 12 Jul 2017 09:08:57 +0900 >> >> >> schrieb Ryusuke Konishi <konishi.ryusuke@xxxxxxxxx>: >> >> >> >> >> >>> Hi, >> >> >>> >> >> >>> 2017-07-11 15:19 GMT+09:00 Andreas Rohner >> >> >>> <andreas.rohner@xxxxxxx>: >> >> >>> > >> >> >>> > There is a race condition between the function >> >> >>> > nilfs_dirty_inode() and nilfs_set_file_dirty(). >> >> >>> > >> >> >>> > When a file is opened, nilfs_dirty_inode() is called to >> >> >>> > update the access timestamp in the inode. It calls >> >> >>> > __nilfs_mark_inode_dirty() in a separate transaction. >> >> >>> > __nilfs_mark_inode_dirty() caches the ifile buffer_head in >> >> >>> > the i_bh field of the inode info structure and marks it as >> >> >>> > dirty. >> >> >>> > >> >> >>> > After some data was written to the file in another >> >> >>> > transaction, the function nilfs_set_file_dirty() is called, >> >> >>> > which adds the inode to the ns_dirty_files list. >> >> >>> > >> >> >>> > Then the segment construction calls >> >> >>> > nilfs_segctor_collect_dirty_files(), which goes through the >> >> >>> > ns_dirty_files list and checks the i_bh field. If there is a >> >> >>> > cached buffer_head in i_bh it is not marked as dirty again. >> >> >>> > >> >> >>> > Since nilfs_dirty_inode() and nilfs_set_file_dirty() use >> >> >>> > separate transactions, it is possible that a segment >> >> >>> > construction that writes out the ifile occurs in-between the >> >> >>> > two. If this happens the inode is not on the ns_dirty_files >> >> >>> > list, but its ifile block is still marked as dirty and >> >> >>> > written out. >> >> >>> > >> >> >>> > In the next segment construction, the data for the file is >> >> >>> > written out and nilfs_bmap_propagate() updates the b-tree. >> >> >>> > Eventually the bmap root is written into the i_bh block, >> >> >>> > which is not dirty, because it was written out in another >> >> >>> > segment construction. >> >> >>> > >> >> >>> > As a result the bmap update can be lost, which leads to file >> >> >>> > system corruption. Either the virtual block address points >> >> >>> > to an unallocated DAT block, or the DAT entry will be reused >> >> >>> > for something different. >> >> >>> > >> >> >>> > The error can remain undetected for a long time. A typical >> >> >>> > error message would be one of the "bad btree" errors or a >> >> >>> > warning that a DAT entry could not be found. >> >> >>> > >> >> >>> > This bug can be reproduced reliably by a simple benchmark >> >> >>> > that creates and overwrites millions of 4k files. >> >> >>> > >> >> >>> > Signed-off-by: Andreas Rohner <andreas.rohner@xxxxxxx> >> >> >>> >> >> >>> Will review, please wait for a while. >> >> >> >> >> >> Thank you, I have uploaded my benchmark on GitHub: >> >> >> >> >> >> https://github.com/zeitgeist87/bugsimulator >> >> >> >> >> >> The benchmark is very simple. It writes out 13107200 4k files >> >> >> in a two layer directory tree, so that every directory contains >> >> >> about 1000 files. Then it overwrites these files in an infinite >> >> >> loop. >> >> >> >> >> >> You need quite aggressive cleaner settings for it to keep up >> >> >> with the bugsimulator: >> >> >> >> >> >> mc_nsegments_per_clean 16 >> >> >> nsegments_per_clean 16 >> >> >> max_clean_segments 30% >> >> >> cleaning_interval 0.1 >> >> >> mc_cleaning_interval 0.1 >> >> >> >> >> >> If you run this benchmark on a fresh 100GB NILFS2 partition with >> >> >> the above cleaner settings, the file system will be corrupt in >> >> >> about half an hour. At least on my machine. >> >> >> >> >> >> With the patch for the race condition, it can run indefinetly. >> >> >> >> >> >> Regards, >> >> >> Andreas >> >> >> >> >> >>> Regards, >> >> >>> Ryusuke Konishi >> >> >>> >> >> >>> > >> >> >>> > --- >> >> >>> > fs/nilfs2/segment.c | 6 ++++-- >> >> >>> > 1 file changed, 4 insertions(+), 2 deletions(-) >> >> >>> > >> >> >>> > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c >> >> >>> > index 70ded52dc1dd..50e12956c737 100644 >> >> >>> > --- a/fs/nilfs2/segment.c >> >> >>> > +++ b/fs/nilfs2/segment.c >> >> >>> > @@ -1958,8 +1958,6 @@ static int >> >> >>> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, >> >> >>> > err, ii->vfs_inode.i_ino); return err; >> >> >>> > } >> >> >>> > - mark_buffer_dirty(ibh); >> >> >>> > - nilfs_mdt_mark_dirty(ifile); >> >> >>> > spin_lock(&nilfs->ns_inode_lock); >> >> >>> > if (likely(!ii->i_bh)) >> >> >>> > ii->i_bh = ibh; >> >> >>> > @@ -1968,6 +1966,10 @@ static int >> >> >>> > nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, >> >> >>> > goto retry; } >> >> >>> > >> >> >>> > + // Always redirty the buffer to avoid race >> >> >>> > condition >> >> >>> > + mark_buffer_dirty(ii->i_bh); >> >> >>> > + nilfs_mdt_mark_dirty(ifile); >> >> >>> > + >> >> >>> > clear_bit(NILFS_I_QUEUED, &ii->i_state); >> >> >>> > set_bit(NILFS_I_BUSY, &ii->i_state); >> >> >>> > list_move_tail(&ii->i_dirty, >> >> >>> > &sci->sc_dirty_files); -- >> >> >>> > 2.13.2 >> >> >>> > >> >> >>> > -- >> >> >>> > To unsubscribe from this list: send the line "unsubscribe >> >> >>> > linux-nilfs" in the body of a message to >> >> >>> > majordomo@xxxxxxxxxxxxxxx More majordomo info at >> >> >>> > http://vger.kernel.org/majordomo-info.html >> >> >>> -- >> >> >>> To unsubscribe from this list: send the line "unsubscribe >> >> >>> linux-nilfs" in the body of a message to >> >> >>> majordomo@xxxxxxxxxxxxxxx More majordomo info at >> >> >>> http://vger.kernel.org/majordomo-info.html >> >> >> >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe >> >> linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx >> >> More majordomo info at >> >> http://vger.kernel.org/majordomo-info.html >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe >> linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html