On Mon, 02 Sep 2013 17:27:44 +0400, Vyacheslav Dubeyko wrote: > From: Vyacheslav Dubeyko <slava@xxxxxxxxxxx> > Subject: [PATCH] [CRITICAL] nilfs2: fix issue with race condition of competition between segments for dirty blocks Before considering to apply this patch, please confirm that the suspected issue, "dirty buffer_heads are added in several lists of payload_buffers" you pointed out, is actually happening. The easiest way for this is testing the following patch: diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index bd88a74..cab38db 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -668,6 +668,7 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode *inode, if (!buffer_dirty(bh)) continue; get_bh(bh); + BUG_ON(!list_empty(&bh->b_assoc_buffers)); list_add_tail(&bh->b_assoc_buffers, listp); ndirties++; if (unlikely(ndirties >= nlimit)) { @@ -701,6 +702,7 @@ static void nilfs_lookup_dirty_node_buffers(struct inode *inode, do { if (buffer_dirty(bh)) { get_bh(bh); + BUG_ON(!list_empty(&bh->b_assoc_buffers)); list_add_tail(&bh->b_assoc_buffers, listp); } You can rewrite the above BUG_ON() check with the following debug code if you want to get detail information on the buffer: if (!list_empty(&bh->b_assoc_buffers)) { /* dump information on bh */ } As a workaround, we can avoid double registration to different payload lists with list_empty(&bh->b_assoc_buffers) instead of adding BH_async_write flag. However, I think we should first narrow down the root cause of why the issue "dirty buffer_heads are added in several lists of payload_buffers" happens. We designed the log writer so that the same buffer is not registered to two different payload buffer lists at the same time. If it is not satisfied, it's a bug. What is the root cause of the "double registration" of a buffer head, do you think? Regards, Ryusuke Konishi > Many NILFS2 users were reported about strange file system corruption (for example): > > [129306.872119] NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768 > [129306.872127] NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540) > > But such error messages are consequence of file system's issue that takes place > more earlier. Fortunately, Jerome Poulin <jeromepoulin@xxxxxxxxx> and > Anton Eliasson <devel@xxxxxxxxxxxxxxxx> were reported about another issue > not so recently. These reports describe the issue with segctor thread's crash: > > [1677.310656] BUG: unable to handle kernel paging request at 0000000000004c83 > [1677.310683] IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2] > > [1677.311562] Call Trace: > [1677.311575] [<ffffffffa024e4c5>] nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2] > [1677.311596] [<ffffffff81019d09>] ? sched_clock+0x9/0x10 > [1677.311614] [<ffffffffa024f3ab>] nilfs_segctor_construct+0x17b/0x290 [nilfs2] > [1677.311636] [<ffffffffa024f5e2>] nilfs_segctor_thread+0x122/0x3b0 [nilfs2] > [1677.311657] [<ffffffffa024f4c0>] ?nilfs_segctor_construct+0x290/0x290 [nilfs2] > [1677.311677] [<ffffffff8107cae0>] kthread+0xc0/0xd0 > [1677.311690] [<ffffffff8107ca20>] ? kthread_create_on_node+0x120/0x120 > [1677.311709] [<ffffffff816dd16c>] ret_from_fork+0x7c/0xb0 > [1677.311724] [<ffffffff8107ca20>] ? kthread_create_on_node+0x120/0x120 > > These two issues have one reason. This reason can raise third issue too. > Third issue results in hanging of segctor thread with eating of 100% CPU. > > REPRODUCING PATH: > One of the possible way or the issue reproducing was described by > Jérôme Poulin <jeromepoulin@xxxxxxxxx>: > > 1. init S to get to single user mode. > 2. sysrq+E to make sure only my shell is running > 3. start network-manager to get my wifi connection up > 4. login as root and launch "screen" > 5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies. > 6. lscp | xz -9e > lscp.txt.xz > 7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs > 8. start a screen to dump /proc/kmsg to text file since rsyslog is killed > 9. start a screen and launch strace -f -o find-cat.log -t find > /mnt/nilfs -type f -exec cat {} > /dev/null \; > 10. start a screen and launch strace -f -o apt-get.log -t apt-get update > 11. launch the last command again as it did not crash the first time > 12. apt-get crashes > 13. ps aux > ps-aux-crashed.log > 13. sysrq+W > 14. sysrq+E wait for everything to terminate > 15. sysrq+SUSB > > Simplified way of the issue reproducing is starting kernel compilation task > and "apt-get update" in parallel. > > REPRODUCIBILITY: > The issue is reproduced not stable [60% - 80%]. It is very important to have > proper environment for the issue reproducing. The critical conditions for > successful reproducing: > (1) It should have big modified file by mmap() way. > (2) This file should have the count of dirty blocks are greater that several > segments in size (for example, two or three) from time to time during > processing. > (3) It should be intensive background activity of files modification in > another thread. > > INVESTIGATION: > First of all, it is possible to see that the reason of crash is not valid > page address: > > [291.101244] NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82 > [291.101249] NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783 > > Moreover, value of b_page (0x1a82) is 6786. This value looks like segment > number. And b_blocknr with b_size values look like block numbers. So, > buffer_head's pointer points on not proper address value. > > Detailed investigation of the issue is discovered such picture: > > [-----------------------------SEGMENT 6783-------------------------------] > [290.886448] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction > [290.886460] NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect > [290.886578] NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign > [290.886601] NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage > [290.886608] NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write > [290.886631] NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs > [290.886663] NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write > [290.886697] NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783 > > [-----------------------------SEGMENT 6784-------------------------------] > [290.886755] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction > [290.886765] NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect > [290.952406] NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 > [290.952411] NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8 > [290.919399] NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign > [290.932458] NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage > [290.933098] NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write > [290.939184] NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs > [290.951133] NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write > [290.951314] NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 > [290.951321] NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784 > [290.951325] NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50 > [290.951534] NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0 > [----------] ditto > [290.952308] NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15 > > [-----------------------------SEGMENT 6785-------------------------------] > [290.952340] NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction > [290.952364] NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect > [290.952406] NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 > [290.952411] NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88 > [290.990248] NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage > [290.990265] NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write > [291.008348] NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs > [291.018110] NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write > [291.021206] NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824 > [291.021210] NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785 > [291.021214] NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8 > [291.021241] NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0 > [----------] ditto > [291.021916] NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12 > > [291.021944] NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait > [291.021950] NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783 > [291.021964] NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784 > [291.021984] NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785 > > [291.071861] NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82 > > BUG: unable to handle kernel paging request at 0000000000001a82 > IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2] > > Usually, for every segment we collect dirty files in list. Then, dirty blocks > are gathered for every dirty file, prepared for write and submitted by means of > nilfs_segbuf_submit_bh() call. Finally, it takes place complete write phase > after calling nilfs_end_bio_write() on the block layer. Buffers/pages > are marked as not dirty on final phase and processed files removed from the > list of dirty files. > > It is possible to see that we had three prepare_write and submit_bio phases > before segbuf_wait and complete_write phase. Moreover, segments compete > between each other for dirty blocks because on every iteration of segments > processing dirty buffer_heads are added in several lists of payload_buffers: > > [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50 > [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8 > > The next pointer is the same but prev pointer has changed. It means that > buffer_head has next pointer from one list but prev pointer from another. > Such modification can be made several times. And, finally, > it can be resulted in various issues: (1) segctor hanging, (2) segctor > crashing, (3) file system metadata corruption. > > FIX: > This patch adds: > (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write() > for every proccessed dirty block; > (2) checking of BH_Async_Write flag in nilfs_lookup_dirty_data_buffers() and > nilfs_lookup_dirty_node_buffers(); > (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(), > nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page(). > > Reported-by: Jerome Poulin <jeromepoulin@xxxxxxxxx> > Reported-by: Anton Eliasson <devel@xxxxxxxxxxxxxxxx> > CC: Paul Fertser <fercerpav@xxxxxxxxx> > CC: ARAI Shun-ichi <hermes@xxxxxxxxxxxxxxx> > CC: Piotr Szymaniak <szarpaj@xxxxxxxxxxx> > CC: Juan Barry Manuel Canham <Linux@xxxxxxxxxxxxxxxxxxx> > CC: Zahid Chowdhury <zahid.chowdhury@xxxxxxxxxxxxxxxxx> > CC: Elmer Zhang <freeboy6716@xxxxxxxxx> > CC: Kenneth Langga <klangga@xxxxxxxxx> > Signed-off-by: Vyacheslav Dubeyko <slava@xxxxxxxxxxx> > CC: Ryusuke Konishi <konishi.ryusuke@xxxxxxxxxxxxx> > --- > fs/nilfs2/page.c | 2 ++ > fs/nilfs2/segment.c | 11 +++++++++-- > 2 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c > index 0ba6798..da27664 100644 > --- a/fs/nilfs2/page.c > +++ b/fs/nilfs2/page.c > @@ -94,6 +94,7 @@ void nilfs_forget_buffer(struct buffer_head *bh) > clear_buffer_nilfs_volatile(bh); > clear_buffer_nilfs_checked(bh); > clear_buffer_nilfs_redirected(bh); > + clear_buffer_async_write(bh); > clear_buffer_dirty(bh); > if (nilfs_page_buffers_clean(page)) > __nilfs_clear_page_dirty(page); > @@ -429,6 +430,7 @@ void nilfs_clear_dirty_page(struct page *page, bool silent) > "discard block %llu, size %zu", > (u64)bh->b_blocknr, bh->b_size); > } > + clear_buffer_async_write(bh); > clear_buffer_dirty(bh); > clear_buffer_nilfs_volatile(bh); > clear_buffer_nilfs_checked(bh); > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c > index bd88a74..9f6b486 100644 > --- a/fs/nilfs2/segment.c > +++ b/fs/nilfs2/segment.c > @@ -665,7 +665,7 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode *inode, > > bh = head = page_buffers(page); > do { > - if (!buffer_dirty(bh)) > + if (!buffer_dirty(bh) || buffer_async_write(bh)) > continue; > get_bh(bh); > list_add_tail(&bh->b_assoc_buffers, listp); > @@ -699,7 +699,8 @@ static void nilfs_lookup_dirty_node_buffers(struct inode *inode, > for (i = 0; i < pagevec_count(&pvec); i++) { > bh = head = page_buffers(pvec.pages[i]); > do { > - if (buffer_dirty(bh)) { > + if (buffer_dirty(bh) && > + !buffer_async_write(bh)) { > get_bh(bh); > list_add_tail(&bh->b_assoc_buffers, > listp); > @@ -1579,6 +1580,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) > > list_for_each_entry(bh, &segbuf->sb_segsum_buffers, > b_assoc_buffers) { > + set_buffer_async_write(bh); > if (bh->b_page != bd_page) { > if (bd_page) { > lock_page(bd_page); > @@ -1592,6 +1594,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) > > list_for_each_entry(bh, &segbuf->sb_payload_buffers, > b_assoc_buffers) { > + set_buffer_async_write(bh); > if (bh == segbuf->sb_super_root) { > if (bh->b_page != bd_page) { > lock_page(bd_page); > @@ -1677,6 +1680,7 @@ static void nilfs_abort_logs(struct list_head *logs, int err) > list_for_each_entry(segbuf, logs, sb_list) { > list_for_each_entry(bh, &segbuf->sb_segsum_buffers, > b_assoc_buffers) { > + clear_buffer_async_write(bh); > if (bh->b_page != bd_page) { > if (bd_page) > end_page_writeback(bd_page); > @@ -1686,6 +1690,7 @@ static void nilfs_abort_logs(struct list_head *logs, int err) > > list_for_each_entry(bh, &segbuf->sb_payload_buffers, > b_assoc_buffers) { > + clear_buffer_async_write(bh); > if (bh == segbuf->sb_super_root) { > if (bh->b_page != bd_page) { > end_page_writeback(bd_page); > @@ -1755,6 +1760,7 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci) > b_assoc_buffers) { > set_buffer_uptodate(bh); > clear_buffer_dirty(bh); > + clear_buffer_async_write(bh); > if (bh->b_page != bd_page) { > if (bd_page) > end_page_writeback(bd_page); > @@ -1776,6 +1782,7 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci) > b_assoc_buffers) { > set_buffer_uptodate(bh); > clear_buffer_dirty(bh); > + clear_buffer_async_write(bh); > clear_buffer_delay(bh); > clear_buffer_nilfs_volatile(bh); > clear_buffer_nilfs_redirected(bh); > -- > 1.7.9.5 > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html