Re: [PATCH] [CRITICAL] nilfs2: fix issue with race condition of competition between segments for dirty blocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ryusuke,

On Tue, 2013-09-03 at 03:13 +0900, Ryusuke Konishi wrote:
> On Mon, 02 Sep 2013 17:27:44 +0400, Vyacheslav Dubeyko wrote:
> > From: Vyacheslav Dubeyko <slava@xxxxxxxxxxx>
> > Subject: [PATCH] [CRITICAL] nilfs2: fix issue with race condition of competition between segments for dirty blocks
> 
> Before considering to apply this patch, please confirm that the
> suspected issue, "dirty buffer_heads are added in several lists of
> payload_buffers" you pointed out, is actually happening.
> 
> The easiest way for this is testing the following patch:
> 
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index bd88a74..cab38db 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -668,6 +668,7 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode *inode,
>  			if (!buffer_dirty(bh))
>  				continue;
>  			get_bh(bh);
> +			BUG_ON(!list_empty(&bh->b_assoc_buffers));
>  			list_add_tail(&bh->b_assoc_buffers, listp);
>  			ndirties++;
>  			if (unlikely(ndirties >= nlimit)) {
> @@ -701,6 +702,7 @@ static void nilfs_lookup_dirty_node_buffers(struct inode *inode,
>  			do {
>  				if (buffer_dirty(bh)) {
>  					get_bh(bh);
> +					BUG_ON(!list_empty(&bh->b_assoc_buffers));
>  					list_add_tail(&bh->b_assoc_buffers,
>  						      listp);
>  				}
> 

Yes, I can confirm the issue. I have applied your patch. And I have such
output in my syslog after reproducing of the issue:

[  424.605561] ------------[ cut here ]------------
[  424.605576] kernel BUG at fs/nilfs2/segment.c:672!
[  424.605581] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[  424.605589] Modules linked in: rfcomm bnep bluetooth parport_pc ppdev snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm xfs snd_seq_midi kvm_amd snd_rawmidi kvm snd_seq_midi_event snd_seq r8712u(C) snd_timer snd_seq_device snd sp5100_tco microcode soundcore k10temp i2c_piix4 lp snd_page_alloc parport mac_hid video ext2 sky2 btrfs raid6_pq xor zlib_deflate libcrc32c
[  424.605650] CPU: 0 PID: 358 Comm: segctord Tainted: G         C   3.10.0-rc5+ #116
[  424.605654] Hardware name:    /EDGE-FT1M1 E450, BIOS 4.6.4 11/15/2011
[  424.605660] task: ffff880208ad22a0 ti: ffff8802085e0000 task.ti: ffff8802085e0000
[  424.605663] RIP: 0010:[<ffffffff812bff53>]  [<ffffffff812bff53>] nilfs_lookup_dirty_data_buffers.isra.22+0x203/0x270
[  424.605678] RSP: 0000:ffff8802085e1b08  EFLAGS: 00010293
[  424.605682] RAX: ffff880219589618 RBX: ffff8802085e1c08 RCX: ffff88022dfb6000
[  424.605685] RDX: ffff880219589660 RSI: ffff880219589618 RDI: ffff88022dff8aa8
[  424.605689] RBP: ffff8802085e1be8 R08: 4600000000000000 R09: a8001c7123000000
[  424.605692] R10: 57ffcd8ee41c48c0 R11: 0000000000000000 R12: 0000000000000000
[  424.605695] R13: ffffea00071c48c0 R14: 0000000000000000 R15: 0000000000000800
[  424.605699] FS:  00002aaaaaae3780(0000) GS:ffff880225a00000(0000) knlGS:0000000000000000
[  424.605703] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  424.605706] CR2: 00007fff2945bff8 CR3: 00000002186f9000 CR4: 00000000000007f0
[  424.605710] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  424.605713] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  424.605716] Stack:
[  424.605719]  ffff8802085e1b78 ffff880219177508 ffff8802191773cc ffffffffffffffff
[  424.605728]  ffff88021a637958 000000000000000e 0000000000000000 ffffea00071c48c0
[  424.605735]  ffffea00071c4880 ffffea00071c4840 ffffea00071c4800 ffffea00071dd7c0
[  424.605742] Call Trace:
[  424.605753]  [<ffffffff812c0050>] nilfs_segctor_scan_file+0x90/0x190
[  424.605760]  [<ffffffff812c0736>] nilfs_segctor_do_construct+0x596/0x1ba0
[  424.605767]  [<ffffffff812c2053>] nilfs_segctor_construct+0x183/0x2a0
[  424.605774]  [<ffffffff812c22b6>] nilfs_segctor_thread+0x146/0x370
[  424.605781]  [<ffffffff812c2170>] ? nilfs_segctor_construct+0x2a0/0x2a0
[  424.605788]  [<ffffffff8106bc9d>] kthread+0xed/0x100
[  424.605794]  [<ffffffff8106bbb0>] ? flush_kthread_worker+0x190/0x190
[  424.605802]  [<ffffffff816ef2dc>] ret_from_fork+0x7c/0xb0
[  424.605808]  [<ffffffff8106bbb0>] ? flush_kthread_worker+0x190/0x190
[  424.605811] Code: ef 8b 08 d3 e6 48 63 f6 e8 2b 3d ef ff e9 19 ff ff ff 85 c0 74 ac 66 90 eb 9c 8b 95 48 ff ff ff 85 d2 75 09 e8 6f 48 42 00 eb ad <0f> 0b 48 8d bd 48 ff ff ff e8 1f 29 e7 ff e8 5a 48 42 00 eb 98 
[  424.605886] RIP  [<ffffffff812bff53>] nilfs_lookup_dirty_data_buffers.isra.22+0x203/0x270
[  424.605893]  RSP <ffff8802085e1b08>
[  424.605900] ---[ end trace 0061903595fb4cff ]---

> You can rewrite the above BUG_ON() check with the following debug code
> if you want to get detail information on the buffer:
> 
>   if (!list_empty(&bh->b_assoc_buffers)) {
>      /* dump information on bh */
>   }
> 
> As a workaround, we can avoid double registration to different payload
> lists with list_empty(&bh->b_assoc_buffers) instead of adding
> BH_async_write flag.
> 

Ok. I agree that we need to elaborate another fix if you think that my
approach is not clear enough.

> However, I think we should first narrow down the root cause of why the
> issue "dirty buffer_heads are added in several lists of
> payload_buffers" happens.
> 
> We designed the log writer so that the same buffer is not registered
> to two different payload buffer lists at the same time.  If it is not
> satisfied, it's a bug.
> 
> What is the root cause of the "double registration" of a buffer head,
> do you think?
> 

I have such understanding of the issue's reason.

First of all, I am talking about full segments case but not about
partial segments. As I see, all works fine for partial segments case
because processing of partial segments of one full segment is located
inside of one iteration of cycle in nilfs_segctor_do_construct() method.

So, if we have a file with dirty blocks count is greater than full
segment size then such dirty file will be processed by means of several
iterations of cycle in nilfs_segctor_do_construct() method. Also it
needs to take into account the asynchronous nature of processing of
submitted bio requests on block layer. The searching of dirty blocks is
made in nilfs_lookup_dirty_data_buffers() and
nilfs_lookup_dirty_node_buffers() on the basis of lookup of dirty memory
pages in pagecache and selecting found dirty buffers. Found dirty blocks
are added in current segbuf. Then it makes preparation of dirty blocks
for write operation and submission write requests on block layer. When
block layer ends processing of some bio then it calls
nilfs_end_bio_write(). The segctor thread can complete write request in
nilfs_segctor_complete_write() only after nilfs_end_bio_write() call.
Processed buffers are made as clean namely in
nilfs_segctor_complete_write() method. But if we submitted dirty blocks
of file A in previous iteration and try to process dirty blocks of file
A in next iteration again without
nilfs_end_bio_write()/nilfs_segctor_complete_write() pair for previous
iteration then we will have the issue of "double registration" of a
buffer head (because we find pages under asynchronous write as dirty
again).

Maybe I don't clear understand a condition of this issue or NILFS2
driver's architecture but I can't see another solution for this issue
yet. I suppose that we need to talk over the issue in more details. What
do you think?

You said that the log writer design should prevent from registering of
buffer in several payload buffer lists. Could you describe this in more
details?

With the best regards,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux BTRFS]     [Linux CIFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux