Re: ext3 lockdep warning in 2.6.25-rc6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In message <20080325182909.GD21732@xxxxxxxxxxxxxxxxxxxxxxxx>, Jan Kara writes:
> 
> --sdtB3X0nJg68CQEu
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> 
> > I was building a kernel using "make -j 4" inside a unionfs, mounted on top
> > of ext3.  The kernel is vanilla 2.6.25-rc6 plus unionfs patches.  At some
> > point I forced a cache flush using "echo 3 > /proc/sys/vm/drop_caches" and I
> > got the lockdep warning below.  Note that unionfs doesn't appear to be
> > involved in this lockdep warning at all, so I suspect this is probably an
> > issue between jbd and ext3 directly.
> > 
> > Let me know if I can be of more help.
>   Actually, I have a fix for that - attached. Can you try it? Thanks.
> 									Honza
> 
> > [31270.749254] =======================================================
> > [31270.749473] [ INFO: possible circular locking dependency detected ]
> > [31270.749720] 2.6.25-rc6-unionfs2 #69
> > [31270.749831] -------------------------------------------------------
> > [31270.749958] flush-caches.sh/9702 is trying to acquire lock:
> > [31270.750171]  (&journal->j_list_lock){--..}, at: [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
> > [31270.751607] 
> > [31270.751607] but task is already holding lock:
> > [31270.751607]  (inode_lock){--..}, at: [<c02718bb>] drop_pagecache+0x45/0xd3
> > [31270.751607] 
> > [31270.751607] which lock already depends on the new lock.
> > [31270.751607] 
> > [31270.751607] 
> > [31270.751607] the existing dependency chain (in reverse order) is:
> > [31270.751607] 
> > [31270.751607] -> #1 (inode_lock){--..}:
> > [31270.751607]        [<c0231e01>] __lock_acquire+0x934/0xab8
> > [31270.751607]        [<c0232307>] lock_acquire+0x4e/0x6a
> > [31270.751607]        [<c0395e33>] _spin_lock+0x23/0x32
> > [31270.751607]        [<c02713d2>] __mark_inode_dirty+0xbe/0x134
> > [31270.751607]        [<c0274b3a>] __set_page_dirty+0xdd/0xec
> > [31270.751607]        [<c0274ba9>] mark_buffer_dirty+0x60/0x63
> > [31270.751607]        [<c029d2f6>] __journal_temp_unlink_buffer+0xc9/0xce
> > [31270.751607]        [<c029d43b>] __journal_unfile_buffer+0xb/0x15
> > [31270.751607]        [<c029d487>] __journal_refile_buffer+0x42/0x8c
> > [31270.751607]        [<c029fdd1>] journal_commit_transaction+0xc59/0xe64
> > [31270.751607]        [<c02a29f1>] kjournald+0xae/0x1b0
> > [31270.751607]        [<c0228833>] kthread+0x3b/0x64
> > [31270.751607]        [<c02039d3>] kernel_thread_helper+0x7/0x10
> > [31270.751607]        [<ffffffff>] 0xffffffff
> > [31270.751607] 
> > [31270.751607] -> #0 (&journal->j_list_lock){--..}:
> > [31270.751607]        [<c0231d28>] __lock_acquire+0x85b/0xab8
> > [31270.751607]        [<c0232307>] lock_acquire+0x4e/0x6a
> > [31270.751607]        [<c0395e33>] _spin_lock+0x23/0x32
> > [31270.751607]        [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
> > [31270.751607]        [<c02925fe>] ext3_releasepage+0x53/0x5c
> > [31270.751607]        [<c023e4bd>] try_to_release_page+0x32/0x3f
> > [31270.751607]        [<c0243f59>] __invalidate_mapping_pages+0x69/0xc6
> > [31270.751607]        [<c02718de>] drop_pagecache+0x68/0xd3
> > [31270.751607]        [<c0271972>] drop_caches_sysctl_handler+0x29/0x3f
> > [31270.751607]        [<c0289383>] proc_sys_write+0x61/0x7e
> > [31270.751607]        [<c02594a1>] vfs_write+0x8c/0x108
> > [31270.751607]        [<c0259a2b>] sys_write+0x3b/0x60
> > [31270.751607]        [<c0202cee>] sysenter_past_esp+0x5f/0xa5
> > [31270.751607]        [<ffffffff>] 0xffffffff
> > [31270.751607] 
> > [31270.751607] other info that might help us debug this:
> > [31270.751607] 
> > [31270.751607] 2 locks held by flush-caches.sh/9702:
> > [31270.751607]  #0:  (&type->s_umount_key#12){----}, at: [<c02718ab>] drop_pagecache+0x35/0xd3
> > [31270.751607]  #1:  (inode_lock){--..}, at: [<c02718bb>] drop_pagecache+0x45/0xd3
> > [31270.751608] 
> > [31270.751608] stack backtrace:
> > [31270.751608] Pid: 9702, comm: flush-caches.sh Not tainted 2.6.25-rc6-unionfs2 #69
> > [31270.751608]  [<c0230687>] print_circular_bug_tail+0x5b/0x66
> > [31270.751608]  [<c0231d28>] __lock_acquire+0x85b/0xab8
> > [31270.751608]  [<c0232307>] lock_acquire+0x4e/0x6a
> > [31270.751608]  [<c029ef6a>] ? journal_try_to_free_buffers+0xa5/0x158
> > [31270.751608]  [<c0395e33>] _spin_lock+0x23/0x32
> > [31270.751608]  [<c029ef6a>] ? journal_try_to_free_buffers+0xa5/0x158
> > [31270.751608]  [<c029ef6a>] journal_try_to_free_buffers+0xa5/0x158
> > [31270.751608]  [<c02925fe>] ext3_releasepage+0x53/0x5c
> > [31270.751608]  [<c023e4bd>] try_to_release_page+0x32/0x3f
> > [31270.751608]  [<c0243f59>] __invalidate_mapping_pages+0x69/0xc6
> > [31270.751608]  [<c02718de>] drop_pagecache+0x68/0xd3
> > [31270.751608]  [<c0271972>] drop_caches_sysctl_handler+0x29/0x3f
> > [31270.751608]  [<c0289383>] proc_sys_write+0x61/0x7e
> > [31270.751608]  [<c0289322>] ? proc_sys_write+0x0/0x7e
> > [31270.751608]  [<c02594a1>] vfs_write+0x8c/0x108
> > [31270.751608]  [<c0259a2b>] sys_write+0x3b/0x60
> > [31270.751608]  [<c0202cee>] sysenter_past_esp+0x5f/0xa5
> > [31270.751608]  =======================
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> -- 
> Jan Kara <jack@xxxxxxx>
> SuSE CR Labs
> 
> --sdtB3X0nJg68CQEu
> Content-Type: text/x-diff; charset=us-ascii
> Content-Disposition: attachment; filename="vfs-2.6.25-drop_pagecache_sb_fix.diff"
> 
> From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@xxxxxxx>
> Date: Tue, 18 Mar 2008 14:38:06 +0100
> Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under
> inode_lock.
> 
> Signed-off-by: Jan Kara <jack@xxxxxxx>
> ---
>  fs/drop_caches.c |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> index 59375ef..f5aae26 100644
> --- a/fs/drop_caches.c
> +++ b/fs/drop_caches.c
> @@ -14,15 +14,21 @@ int sysctl_drop_caches;
>  
>  static void drop_pagecache_sb(struct super_block *sb)
>  {
> -	struct inode *inode;
> +	struct inode *inode, *toput_inode = NULL;
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
>  		if (inode->i_state & (I_FREEING|I_WILL_FREE))
>  			continue;
> +		__iget(inode);
> +		spin_unlock(&inode_lock);
>  		__invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> +		iput(toput_inode);
> +		toput_inode = inode;
> +		spin_lock(&inode_lock);
>  	}
>  	spin_unlock(&inode_lock);
> +	iput(toput_inode);
>  }

Jan, I'll be happy to test this, but I don't understand two things about
this patch:

1. Is it safe to unlock and re-lock inode_lock temporarily within the loop?

2. What's the motivation behind having the second toput_inode pointer?  It
   appears that the first iteration through the loop, toput_inode will be
   NULL, so we'll be iput'ing a NULL pointer (which is ok).  So you're
   trying to iput the previous inode pointer that the list iterated over,
   right?  Is that intended?

Peter's post:

	http://lkml.org/lkml/2008/3/23/202

includes a reference to a mail by Andrew which implies that the fix may be
much more involved than what you outlined above, no?

Thanks,
Erez.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux