Re: [PATCH] nilfs2: fix issue with broken bmap for the case of block size lesser than 4 KB

Ryusuke Konishi <konishi.ryusuke@xxxxxxxxxxxxx> · Wed, 01 May 2013 03:11:12 +0900 (JST)

Hi Vyacheslav,
On Mon, 29 Apr 2013 13:24:53 +0400, Vyacheslav Dubeyko wrote:
> Hi Ryusuke,
> 
> On Fri, 2013-04-26 at 01:37 +0900, Ryusuke Konishi wrote:
> 
> [snip]
>> 
>> Adding BH_NILFS_Allocated flag looks overkill for this issue.
>> 
>> Why not modify btree or direct mapping code so that the buffer dirty
>> flag is propery set or cleared also for 1 KB block?  Basically, the
>> block mapping code of NILFS is designed to be able to handle 1 KB
>> block.
>> 
> 
> My idea is simple. As I can see, usual way of processing page's buffers
> in NILFS2 driver is likewise:
> 
> struct buffer_head *bh, *head;
> bh = head = page_buffers(page);
> do {
> 
> 	/* Do something */
> 
> } while (bh = bh->b_this_page, bh != head);
> 
> And, as I understand, the essence of the issue is in processing of spare
> buffers that were allocated during create_empty_buffers() call but these
> buffers shouldn't be used yet. So, from my viewpoint, the reasonable way
> of the issue fix is to ignore such spare buffers. Because operations
> with spare buffers can be resulted in presence of very sophisticated
> bugs and performance degradation. As a result, I suggested to use
> BH_NILFS_Allocated flag in this patch.

That is the point of difference.  I think we can handle this kind of
issue by managing buffer dirty state strictly since the segment
constructor of nilfs only picks up dirty buffers for write through
nilfs_lookup_dirty_data_buffers function.  Spare buffers are simply
ignored if they don't have dirty flag.

I don't think we need the BH_NILFS_Allocated flag.

>> If this issue occurs after file size extension with ftruncate(), I
>> guess handling of hole blocks created by the extension seems to have
>> problem.
>> 
> 
> The nature of this issue is more complex, as I described in patch. The
> ftruncate call is simply beginning of the issue. The peculiarity of this
> call for this issue consists in growing file from zero bytes to 60
> bytes. So, for 1 KB block size we should use only one block (hole) but,
> as a result, we have 4 allocated buffers (1 for hole block and 3 spare
> buffers).
> 
> [  223.543981] nilfs_truncate:715: blkoff 1, inode->i_size 60
> [  223.544068] nilfs_get_block:89: inode->i_ino 21, ii->i_bmap->b_last_allocated_ptr 0
> [  223.544079] nilfs_truncate:729: inode->i_ino 21
> [  223.544086] Pid: 2424, comm: darcs Tainted: G          I  3.8.0-rc1 #147
> [  223.544089] Call Trace:
> [  223.544098]  [<ffffffff812e0c77>] nilfs_truncate+0x117/0x130
> [  223.544105]  [<ffffffff811516de>] ? truncate_pagecache+0x5e/0x70
> [  223.544111]  [<ffffffff812e0d11>] nilfs_setattr+0x81/0xb0
> [  223.544119]  [<ffffffff817285b1>] ? mutex_lock_nested+0x2a1/0x370
> [  223.544126]  [<ffffffff811bf60b>] notify_change+0x1cb/0x390
> [  223.544132]  [<ffffffff811a1290>] do_truncate+0x60/0xa0
> [  223.544138]  [<ffffffff811a1644>] sys_ftruncate+0x114/0x180
> [  223.544146]  [<ffffffff8139fcae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  223.544152]  [<ffffffff81735119>] system_call_fastpath+0x16/0x1b
> 
> Then, it is possible to identify trying to write in file mapping:
> 
> [  489.785334] nilfs_page_mkwrite:146: inode->i_ino 21
> [  489.785336] Pid: 2447, comm: darcs Tainted: G          I  3.8.0-rc1 #149
> [  489.785338] Call Trace:
> [  489.785342]  [<ffffffff812e1a55>] nilfs_page_mkwrite+0x315/0x390
> [  489.785347]  [<ffffffff81169562>] __do_fault+0xe2/0x4c0
> [  489.785351]  [<ffffffff8116c260>] handle_pte_fault+0x90/0x8e0
> [  489.785355]  [<ffffffff8101dcc9>] ? sched_clock+0x9/0x10
> [  489.785359]  [<ffffffff81098985>] ? sched_clock_local+0x25/0x90
> [  489.785363]  [<ffffffff81730644>] ? __do_page_fault+0x114/0x5b0
> [  489.785367]  [<ffffffff8116dc85>] handle_mm_fault+0x235/0x330
> [  489.785371]  [<ffffffff817306cc>] __do_page_fault+0x19c/0x5b0
> [  489.785375]  [<ffffffff81098b18>] ? sched_clock_cpu+0xa8/0x110
> [  489.785378]  [<ffffffff810bd9fd>] ? trace_hardirqs_off+0xd/0x10
> [  489.785382]  [<ffffffff81098bef>] ? local_clock+0x6f/0x80
> [  489.785386]  [<ffffffff810be345>] ? lock_release_holdtime.part.25+0x15/0x1a0
> [  489.785390]  [<ffffffff811c1183>] ? __close_fd+0x83/0xb0
> [  489.785394]  [<ffffffff8139fe9d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [  489.785398]  [<ffffffff81730aee>] do_page_fault+0xe/0x10
> [  489.785402]  [<ffffffff8172cc48>] page_fault+0x28/0x30

Thank you for narrowing down the issue.

So, the root cause is in the use of __block_page_mkwrite() in the
nilfs_page_mkwrite function.

Even if __block_page_mkwrite() only fills hole blocks inside EOF, it
calls set_page_dirty(), and this leads to call nilfs_set_page_dirty()
which marks all buffers on the page dirty through
__set_page_dirty_buffers().  Thus, inconsistent dirty state is
created.

There seems to be two problems here.  The first problem is that the
use of __block_page_mkwrite() in nilfs_page_mkwrite() breaks
consistency of the buffer dirty state if buffers are not fullly filled
due to EOF.  The second problem is that the number of newly dirtied
buffers, which will be passed to nilfs_set_file_dirty function, goes
wrong in that case.

I will later post a bug fix for these issues.

With regards,
Ryusuke Konishi

> It is possible to see that after segment construction we have such state
> of bmap (namely, it was set two block numbers):
> 
> [  490.177923] nilfs_direct_set_ptr: inode: 21
> [  490.177925] NILFS2: 00 00 00 00 00 00 00 00 01 14 00 00 00 00 00 00  ................
> [  490.177927] NILFS2: 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [  490.177929] NILFS2: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [  490.177931] NILFS2: 00 00 00 00 00 00 00 00
> 
> And, finally, we have issue:
> 
> [  490.179790] nilfs_direct_assign: inode: 21
> [  490.179800] nilfs_direct_assign: ptr: 5121
> [  490.179802] nilfs_dat_prepare_entry:57 req->pr_entry_nr 5121
> [  490.179805] nilfs_palloc_get_entry_block:294 nr 5121, create 0
> [  490.179815] nilfs_direct_assign: inode: 21
> [  490.179828] nilfs_direct_assign: ptr: 10
> [  490.179830] nilfs_dat_prepare_entry:57 req->pr_entry_nr 10
> [  490.179832] nilfs_palloc_get_entry_block:294 nr 10, create 0
> [  490.179843] nilfs_direct_assign: inode: 21
> [  490.179857] nilfs_direct_assign: ptr: 0
> [  490.179859] nilfs_direct_assign: invalid pointer: 0
> [  490.179862] NILFS error (device dm-0): nilfs_bmap_assign: broken bmap (inode number=21)
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
>> I will look into the issue too.
>> 
>> 
>> Regards,
>> Ryusuke Konishi
>> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html