Re: [PATCH 3/3] reiser4: in our own sync writes, mark pages dirty before marking them writeback.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2015-10-09 at 22:23 +0200, Edward Shishkin wrote:
> On 10/09/2015 07:14 PM, Ivan Shapovalov wrote:
> > On 2015-10-09 at 16:55 +0200, Edward Shishkin wrote:
> > > On 10/09/2015 03:50 PM, Ivan Shapovalov wrote:
> > > > On 2015-10-09 at 15:27 +0200, Edward Shishkin wrote:
> > > > > Hi Ivan,
> > > > > 
> > > > > On 10/09/2015 01:16 PM, Ivan Shapovalov wrote:
> > > > > > Ref.: https://www.mail-archive.com/linux-f2fs-devel%40lists
> > > > > > .sou
> > > > > > rcef
> > > > > > orge.net/msg02745.html
> > > > > Do you have a stack trace for reiser4?
> > > > > How to reproduce it?
> > > > I'll rebuild the kernel without the fix and provide you with
> > > > the
> > > > oops'
> > > > stacktrace asap.
> > > > 
> > > > I guess that it's tied to the config. In my case, it is
> > > > reproducible on
> > > > each boot, just as the DE starts up and something issues the
> > > > first
> > > > fsync().
> > > 
> > > Yes, let's try to find the culprit who doesn't set i_wb...
> > So, here are the traces I've got after adding an
> > assert(PageDirty(node->pg)) to queue_jnode():
> > /* captured by hand as these are panics, not oopses */
> > 
> > 1.
> > 
> > queue_jnode()
> > unformatted_make_reloc()
> > assign_real_blocknrs()
> > forward_relocate_unformatted()
> > forward_alloc_unformatted_journal()
> > ? coord_num_units()
> > handle_pos_on_twig()
> > flush_current_atom()
> > flush_some_atom()
> > reiser4_writeout()
> > reiser4_writeback_inodes()
> > <...>
> > 
> > 2.
> > 
> > znode_make_reloc()
> > forward_alloc_formatted_wa()
> > ? zload_ra()
> > allocate_znode()
> > alloc_pos_and_ancestors()
> > flush_current_atom()
> > reiser4_txn_end()
> > ? reiser4_txn_end()
> > reiser4_txn_restart_current()
> > force_commit_atom()
> > ? reiser4_txn_restart_current()
> > txnmgr_force_commit_all()
> > writepages_cryptcompress()
> > reiser4_writepages_dispatch()
> > <...>
> > sys_fsync()
> > 
> 
> 
> Thanks Ivan.
> Not a good news, TBH...
> 
> For formatted nodes we can continue to narrow down the problem
> (see the attached patch).

Having applied the patch, I saw loads and loads of warnings (in ~10
distinct stacktraces), but no panics or oopses in the initial location.
The false positives are possible, right?

The traces:

1.
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036de5c>] scan_by_coord+0x62c/0xed0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036e86d>] scan_unformatted+0x16d/0x320 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032b1f0>] ? incr_load_count+0x20/0xd0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036ed9b>] scan_common+0x37b/0x790 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0370074>] flush_current_atom+0xec4/0x1b40 [reiser4]
<...>

2.
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036b952>] neighbor_in_slum.constprop.12+0x82/0x1c0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036bc4a>] handle_pos_on_formatted+0x1ba/0xa40 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
<...>

3.
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032f0c3>] unlock_carry_level+0xb3/0xd80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032fdb0>] done_carry_level+0x20/0x1f0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0332036>] reiser4_carry+0x396/0x7b0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032bc0c>] ? reiser4_add_obj+0x9c/0x370 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033fb4a>] insert_into_item+0x1fa/0x610 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033ffd4>] reiser4_resize_item+0x74/0x190 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03ec314>] add_entry_cde+0x104/0x2f0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0329af5>] ? znode_invariant+0x3a5/0xd50 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03aa19e>] reiser4_rename2_common+0xbce/0x1140 [reiser4]
<...>

4.
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03f48af>] free_item_convert_data+0x3f/0x150 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03f5656>] detach_convert_idata+0x26/0x110 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03fd0f6>] convert_ctail+0x1016/0x2060 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03648ba>] convert_node+0x22a/0xd30 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032b40d>] ? zrelse+0x1d/0x70 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036bfc2>] handle_pos_on_formatted+0x532/0xa40 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
<...>

5.
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032f0c3>] unlock_carry_level+0xb3/0xd80 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032fdb0>] done_carry_level+0x20/0x1f0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0332036>] reiser4_carry+0x396/0x7b0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc032bc0c>] ? reiser4_add_obj+0x9c/0x370 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033edda>] insert_with_carry_by_coord+0xea/0x250 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03d6016>] ? free_space_node40+0x16/0x170 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc033f3c6>] insert_by_coord+0x166/0x360 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03fa16f>] ctail_insert_unprepped_cluster+0x1df/0x750 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03c98e3>] prepare_logical_cluster+0x753/0x17f0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03cabdf>] do_write_cryptcompress+0x25f/0xed0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc0347a69>] ? is_in_reiser4_context+0x19/0x30 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03ce8d1>] write_cryptcompress+0xa1/0x1d0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03477fa>] ? _reiser4_init_context+0x6a/0xf0 [reiser4]
Oct 10 00:28:42 intelfx-laptop kernel:  [<ffffffffc03bcc66>] reiser4_write_dispatch+0x166/0x4f0 [reiser4]
<...>

6.
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036611a>] move_flush_pos+0xba/0x2c0 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036c10e>] handle_pos_on_formatted+0x67e/0xa40 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
<...>

7.
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffff8145ddac>] dump_stack+0x4c/0x6e
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc033cc88>] longterm_unlock_znode+0x738/0xe80 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03f48af>] free_item_convert_data+0x3f/0x150 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03f5656>] detach_convert_idata+0x26/0x110 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03fd0f6>] convert_ctail+0x1016/0x2060 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc03648ba>] convert_node+0x22a/0xd30 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc0363b9e>] ? znode_check_flushprepped+0xfe/0x360 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036bb28>] handle_pos_on_formatted+0x98/0xa40 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc036c546>] handle_pos_on_leaf+0x16/0x80 [reiser4]
Oct 10 00:28:43 intelfx-laptop kernel:  [<ffffffffc0370400>] flush_current_atom+0x1250/0x1b40 [reiser4]
<...>

...and so on.

I didn't check the code yet; I'll probably try with that assertion converted into warning and split into two
(one for formatted and another for unformatted nodes), so that I could check what type of nodes is responsible
for generating the final oops in set_page_writeback().

> For unformatted nodes only code review
> can help. Normally, all modifications of unformatted nodes should
> look like the following:
> 
> struct page *page = jnode_page(node);
> lock_page(page);
> char *data = kmap(page);
> /* modifications are going here */
> kunmap(page);
> set_page_dirty_nobuffers(page); /* somebody forgets to do this */
> unlock_page(page);
> 
> Modifications of formatted nodes should look like the following:
> 
> longterm_lock_znode(node);
> zload(node);
> /* modifications are going here */
> zrelse(node);
> znode_make_dirty(node); /* somebody forgets to do this */
> longterm_unlock_znode();
> 
> Anyway, we can use your patch 3 as a temporal fixup.

The most persistent things are those conseived as the most temporary
ones... ;)

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux File System Development]     [Linux BTRFS]     [Linux NFS]     [Linux Filesystems]     [Ext4 Filesystem]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Resources]

  Powered by Linux