Re: ext3 lockdep warning in 2.6.25-rc6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 01-04-08 17:23:34, Erez Zadok wrote:
 <snip>
> > From f5e41087e345fa5c3b46ac36e6e4a654d2f7f624 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack@xxxxxxx>
> > Date: Tue, 18 Mar 2008 14:38:06 +0100
> > Subject: [PATCH] Fix drop_pagecache_sb() to not call __invalidate_mapping_pages() under
> > inode_lock.
> > 
> > Signed-off-by: Jan Kara <jack@xxxxxxx>
> > ---
> >  fs/drop_caches.c |    8 +++++++-
> >  1 files changed, 7 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> > index 59375ef..f5aae26 100644
> > --- a/fs/drop_caches.c
> > +++ b/fs/drop_caches.c
> > @@ -14,15 +14,21 @@ int sysctl_drop_caches;
> >  
> >  static void drop_pagecache_sb(struct super_block *sb)
> >  {
> > -	struct inode *inode;
> > +	struct inode *inode, *toput_inode = NULL;
> >  
> >  	spin_lock(&inode_lock);
> >  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> >  		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> >  			continue;
> > +		__iget(inode);
> > +		spin_unlock(&inode_lock);
> >  		__invalidate_mapping_pages(inode->i_mapping, 0, -1, true);
> > +		iput(toput_inode);
> > +		toput_inode = inode;
> > +		spin_lock(&inode_lock);
> >  	}
> >  	spin_unlock(&inode_lock);
> > +	iput(toput_inode);
> >  }
> 
> Jan, I'll be happy to test this, but I don't understand two things about
> this patch:
> 
> 1. Is it safe to unlock and re-lock inode_lock temporarily within the loop?
> 
> 2. What's the motivation behind having the second toput_inode pointer?  It
>    appears that the first iteration through the loop, toput_inode will be
>    NULL, so we'll be iput'ing a NULL pointer (which is ok).  So you're
>    trying to iput the previous inode pointer that the list iterated over,
>    right?  Is that intended?
  I'll try to explain the locking here:
1) We are not allowed to call into __invalidate_mapping_pages() with
inode_lock held - that it the bug lockdep is complaining about. Moreover it
leads to rather long waiting times for inode_lock (quite possibly several
seconds).
2) When we release inode_lock, we need to protect from inode going away,
thus we hold reference to it - that guarantees us inode stays in the list.
3) When we continue scanning of the list we must get inode_lock before we
put the inode reference to avoid races. But we cannot do iput() when we
hold inode_lock. Thus we save pointer to inode and do iput() next time we
have released the inode_lock...

> Peter's post:
> 
> 	http://lkml.org/lkml/2008/3/23/202
> 
> includes a reference to a mail by Andrew which implies that the fix may be
> much more involved than what you outlined above, no?
  Definitely we can do a more involved fix ;) What Andrew proposes would have
some other benefits as well. But until somebody gets to that, this slight
hack should work fine (Andrew has already merged my patch in -mm so I
guess he agrees ;).

									Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux