Re: [PATCH 3/3] reiser4: in our own sync writes, mark pages dirty before marking them writeback.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/10/2015 12:44 PM, Ivan Shapovalov wrote:
On 2015-10-09 at 22:23 +0200, Edward Shishkin wrote:
On 10/09/2015 07:14 PM, Ivan Shapovalov wrote:
On 2015-10-09 at 16:55 +0200, Edward Shishkin wrote:
On 10/09/2015 03:50 PM, Ivan Shapovalov wrote:
On 2015-10-09 at 15:27 +0200, Edward Shishkin wrote:
Hi Ivan,

On 10/09/2015 01:16 PM, Ivan Shapovalov wrote:
Ref.: https://www.mail-archive.com/linux-f2fs-devel%40lists
.sou
rcef
orge.net/msg02745.html
Do you have a stack trace for reiser4?
How to reproduce it?
I'll rebuild the kernel without the fix and provide you with
the
oops'
stacktrace asap.

I guess that it's tied to the config. In my case, it is
reproducible on
each boot, just as the DE starts up and something issues the
first
fsync().
Yes, let's try to find the culprit who doesn't set i_wb...
So, here are the traces I've got after adding an
assert(PageDirty(node->pg)) to queue_jnode():
/* captured by hand as these are panics, not oopses */

1.

queue_jnode()
unformatted_make_reloc()
assign_real_blocknrs()
forward_relocate_unformatted()
forward_alloc_unformatted_journal()
? coord_num_units()
handle_pos_on_twig()
flush_current_atom()
flush_some_atom()
reiser4_writeout()
reiser4_writeback_inodes()
<...>

2.

znode_make_reloc()
forward_alloc_formatted_wa()
? zload_ra()
allocate_znode()
alloc_pos_and_ancestors()
flush_current_atom()
reiser4_txn_end()
? reiser4_txn_end()
reiser4_txn_restart_current()
force_commit_atom()
? reiser4_txn_restart_current()
txnmgr_force_commit_all()
writepages_cryptcompress()
reiser4_writepages_dispatch()
<...>
sys_fsync()


Thanks Ivan.
Not a good news, TBH...

For formatted nodes we can continue to narrow down the problem
(see the attached patch).
Having applied the patch, I saw loads and loads of warnings (in ~10
distinct stacktraces), but no panics or oopses in the initial location.
The false positives are possible, right?


Yes, a lot of ones and nothing interesting.
The same for Dushan's logs. Sorry for bad idea..

Thanks,
Edward.



The traces:

1.
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036de5c>] scan_by_coord+0x62c/0xed0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036e86d>] scan_unformatted+0x16d/0x320 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032b1f0>] ? incr_load_count+0x20/0xd0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036ed9b>] scan_common+0x37b/0x790 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0370074>] flush_current_atom+0xec4/0x1b40 [reiser4]
<...>

2.
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036b952>] neighbor_in_slum.constprop.12+0x82/0x1c0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036bc4a>] handle_pos_on_formatted+0x1ba/0xa40 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
<...>

3.
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032f0c3>] unlock_carry_level+0xb3/0xd80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032fdb0>] done_carry_level+0x20/0x1f0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0332036>] reiser4_carry+0x396/0x7b0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032bc0c>] ? reiser4_add_obj+0x9c/0x370 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033fb4a>] insert_into_item+0x1fa/0x610 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033ffd4>] reiser4_resize_item+0x74/0x190 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03ec314>] add_entry_cde+0x104/0x2f0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0329af5>] ? znode_invariant+0x3a5/0xd50 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03aa19e>] reiser4_rename2_common+0xbce/0x1140 [reiser4]
<...>

4.
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03f48af>] free_item_convert_data+0x3f/0x150 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03f5656>] detach_convert_idata+0x26/0x110 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03fd0f6>] convert_ctail+0x1016/0x2060 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03648ba>] convert_node+0x22a/0xd30 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032b40d>] ? zrelse+0x1d/0x70 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036bfc2>] handle_pos_on_formatted+0x532/0xa40 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
<...>

5.
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032f0c3>] unlock_carry_level+0xb3/0xd80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032fdb0>] done_carry_level+0x20/0x1f0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0332036>] reiser4_carry+0x396/0x7b0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032bc0c>] ? reiser4_add_obj+0x9c/0x370 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033edda>] insert_with_carry_by_coord+0xea/0x250 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03d6016>] ? free_space_node40+0x16/0x170 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033f3c6>] insert_by_coord+0x166/0x360 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03fa16f>] ctail_insert_unprepped_cluster+0x1df/0x750 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03c98e3>] prepare_logical_cluster+0x753/0x17f0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03cabdf>] do_write_cryptcompress+0x25f/0xed0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0347a69>] ? is_in_reiser4_context+0x19/0x30 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03ce8d1>] write_cryptcompress+0xa1/0x1d0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03477fa>] ? _reiser4_init_context+0x6a/0xf0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03bcc66>] reiser4_write_dispatch+0x166/0x4f0 [reiser4]
<...>

6.
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036611a>] move_flush_pos+0xba/0x2c0 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036c10e>] handle_pos_on_formatted+0x67e/0xa40 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
<...>

7.
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03f48af>] free_item_convert_data+0x3f/0x150 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03f5656>] detach_convert_idata+0x26/0x110 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03fd0f6>] convert_ctail+0x1016/0x2060 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03648ba>] convert_node+0x22a/0xd30 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc0363b9e>] ? znode_check_flushprepped+0xfe/0x360 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036bb28>] handle_pos_on_formatted+0x98/0xa40 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
<...>

...and so on.

I didn't check the code yet; I'll probably try with that assertion converted into warning and split into two
(one for formatted and another for unformatted nodes), so that I could check what type of nodes is responsible
for generating the final oops in set_page_writeback().

For unformatted nodes only code review
can help. Normally, all modifications of unformatted nodes should
look like the following:

struct page *page = jnode_page(node);
lock_page(page);
char *data = kmap(page);
/* modifications are going here */
kunmap(page);
set_page_dirty_nobuffers(page); /* somebody forgets to do this */
unlock_page(page);

Modifications of formatted nodes should look like the following:

longterm_lock_znode(node);
zload(node);
/* modifications are going here */
zrelse(node);
znode_make_dirty(node); /* somebody forgets to do this */
longterm_unlock_znode();

Anyway, we can use your patch 3 as a temporal fixup.
The most persistent things are those conseived as the most temporary
ones... ;)

--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux File System Development]     [Linux BTRFS]     [Linux NFS]     [Linux Filesystems]     [Ext4 Filesystem]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Resources]

  Powered by Linux