Re: [patch] fs: avoid I_NEW inodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



  Hi,

On Tue 10-03-09 14:41:06, Nick Piggin wrote:
> On Thu, Mar 05, 2009 at 12:12:26PM +0100, Jan Kara wrote:
> > On Thu 05-03-09 11:16:37, Nick Piggin wrote:
> > > On Thu, Mar 05, 2009 at 11:00:01AM +0100, Jan Kara wrote:
> > > > On Thu 05-03-09 07:45:54, Nick Piggin wrote:
> > > > > after ~1hour of running. Previously, the new warnings would start immediately
> > > > > and hang would happen in under 5 minutes.
> > > >   A quick grep seems to indicate that you've still missed a few cases,
> > > > haven't you? I still see the same problem in
> > > > drop_caches.c:drop_pagecache_sb() scanning, inode.c:invalidate_inodes()
> > > > scanning, and dquot.c:add_dquot_ref() scanning.
> > > >   Otherwise the patch looks fine.
> > > 
> > > I thought they should be OK; drop_pagecache_sb doesn't play with flags,
> > > invalidate_inodes won't if refcount is elevated, and I think add_dquot_ref
> > > won't if writecount is not elevated...
> >   Ah, ok, you are probably right.
> > 
> > > But maybe that's  abit fragile and it would be better policy to always
> > > skip I_NEW in these traverals?
> >   Yes, it seems too fragile to me. I'm not saying we have to forbid
> > everything for I_NEW inodes but I think we should set clear simple rules
> > what is protected by I_NEW and then verify that all sites which can come
> > across such inodes obey them.
> 
> OK, sorry for the delay, what do you think of the following patch on top
> of the last?
  Thanks for the patch. I have a few comments. See below.

> ---
> 
> To be on the safe side, it should be less fragile to exclude I_NEW inodes
> from inode list scans by default (unless there is an important reason to
> have them).
> 
> Normally they will get excluded (eg. by zero refcount or writecount etc),
> however it is a bit fragile for list walkers to know exactly what parts of
> the inode state is set up and valid to test when in I_NEW. So along these
> lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
> checks with them too -- this shouldn't be a problem should it?)
> 
> Signed-off-by: Nick Piggin <npiggin@xxxxxxx>
> 
> ---
>  fs/dquot.c                  |    6 ++++--
>  fs/drop_caches.c            |    2 +-
>  fs/inode.c                  |    2 ++
>  fs/notify/inotify/inotify.c |   16 ++++++++--------
>  4 files changed, 15 insertions(+), 11 deletions(-)
> 
> Index: linux-2.6/fs/dquot.c
> ===================================================================
> --- linux-2.6.orig/fs/dquot.c
> +++ linux-2.6/fs/dquot.c
> @@ -789,12 +789,12 @@ static void add_dquot_ref(struct super_b
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
> +			continue;
>  		if (!atomic_read(&inode->i_writecount))
>  			continue;
>  		if (!dqinit_needed(inode, type))
>  			continue;
> -		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> -			continue;
>  
>  		__iget(inode);
>  		spin_unlock(&inode_lock);
> @@ -870,6 +870,8 @@ static void remove_dquot_ref(struct supe
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		if (inode->i_state & I_NEW)
> +			continue;
>  		if (!IS_NOQUOTA(inode))
>  			remove_inode_dquot_ref(inode, type, tofree_head);
>  	}
  Hmm, in this scan, we have to scan also I_NEW inodes because they can
already have quota pointers initialized and so we could leave some dangling
quota references if we skipped I_NEW inodes. Nasty. So just add a comment
here like this one here:
/*
 *  We have to scan also I_NEW inodes because they can already have quota
 *  pointer initialized. Luckily, we need to touch only quota pointers and
 *  these have separate locking (dqptr_sem).
 */

> Index: linux-2.6/fs/drop_caches.c
> ===================================================================
> --- linux-2.6.orig/fs/drop_caches.c
> +++ linux-2.6/fs/drop_caches.c
> @@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> -		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> +		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
>  			continue;
>  		if (inode->i_mapping->nrpages == 0)
>  			continue;
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c
> +++ linux-2.6/fs/inode.c
> @@ -356,6 +356,8 @@ static int invalidate_list(struct list_h
>  		if (tmp == head)
>  			break;
>  		inode = list_entry(tmp, struct inode, i_sb_list);
> +		if (inode->i_state & I_NEW)
> +			continue;
  If somebody is setting up inodes at this point, we are in serious
trouble I think. So WARN_ON would be more appropriate I think.

>  		invalidate_inode_buffers(inode);
>  		if (!atomic_read(&inode->i_count)) {
>  			list_move(&inode->i_list, dispose);
> Index: linux-2.6/fs/notify/inotify/inotify.c
> ===================================================================
> --- linux-2.6.orig/fs/notify/inotify/inotify.c
> +++ linux-2.6/fs/notify/inotify/inotify.c
> @@ -380,6 +380,14 @@ void inotify_unmount_inodes(struct list_
>  		struct list_head *watches;
>  
>  		/*
> +		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
> +		 * I_WILL_FREE which is fine because by that point the inode
> +		 * cannot have any associated watches.
> +		 */
  Update the comment?

> +		if (inode->i_state & (I_CLEAR|I_FREEING|I_WILL_FREE|I_NEW))
> +			continue;
> +
> +		/*
>  		 * If i_count is zero, the inode cannot have any watches and
>  		 * doing an __iget/iput with MS_ACTIVE clear would actually
>  		 * evict all inodes with zero i_count from icache which is
> @@ -388,14 +396,6 @@ void inotify_unmount_inodes(struct list_
>  		if (!atomic_read(&inode->i_count))
>  			continue;
>  
> -		/*
> -		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
> -		 * I_WILL_FREE which is fine because by that point the inode
> -		 * cannot have any associated watches.
> -		 */
> -		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
> -			continue;
> -
>  		need_iput_tmp = need_iput;
>  		need_iput = NULL;
>  		/* In case inotify_remove_watch_locked() drops a reference. */

									Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux