Re: [PATCH] Properly init address_space->writeback_index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 14 Aug 2008 14:13:40 -0400
Chris Mason <chris.mason@xxxxxxxxxx> wrote:

> write_cache_pages uses i_mapping->writeback_index to pick up where it
> left off the last time a given inode was found by pdflush or
> balance_dirty_pages (or anyone else who sets wbc->range_cyclic)
> 
> alloc_inode should set it to a sane value so that writeback doesn't start
> in the middle of a file.  It is somewhat difficult to notice the bug since
> write_cache_pages will loop around to the start of the file and the
> elevator helps hide the resulting seeks.
> 
> For whatever reason, Btrfs hits this often.  Unpatched, untarring 30 copies of
> the linux kernel in series runs at 47MB/s on a single sata drive.  With
> this fix, it jumps to 62MB/s.
> 
> Signed-off-by: Chris Mason <chris.mason@xxxxxxxxxx>
> 
> diff -r 67160ae21a58 fs/inode.c
> --- a/fs/inode.c	Wed Aug 06 19:26:20 2008 -0700
> +++ b/fs/inode.c	Thu Aug 14 10:15:49 2008 -0400
> @@ -166,6 +166,7 @@ static struct inode *alloc_inode(struct 
>  		mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
>  		mapping->assoc_mapping = NULL;
>  		mapping->backing_dev_info = &default_backing_dev_info;
> +		mapping->writeback_index = 0;
>  
>  		/*
>  		 * If the block_device provides a backing_dev_info for client
> 

inode_cachep's ctor memsets the whole inode* to zero.

Does btrfs have an ->alloc_inode handler?  If so, perhaps it is btrfs
which needs to zap that field.

Formally, the way slab works (which is arguably silly from a CPU cache
efficiency POV) is that the object should be returned to slab
(kmem_cache_free()) in a "constructed" manner.  Which means that this
zeroing should happen in destroy_inode().

otoh, the fs/inode.c inode creation/destruction code isn't designed
that way - it chose to initialise everything synchronously in
alloc_inode(), which is arguably wrong.

One wonders if that big memset in inode_init_once() actually does
anything useful.  

Ho hum, so what to do?  I'm suspecting that we should just zap the
kmem_cachep constructor altogether, use a synchronous call to
inode_init_once() to initialise all inode fields and then remove all
those open-coded zeroings.


Something like this?

From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>

Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 fs/inode.c |   40 +++++++++-------------------------------
 1 file changed, 9 insertions(+), 31 deletions(-)

diff -puN fs/inode.c~a fs/inode.c
--- a/fs/inode.c~a
+++ a/fs/inode.c
@@ -115,34 +115,24 @@ static struct inode *alloc_inode(struct 
 	static const struct file_operations empty_fops;
 	struct inode *inode;
 
-	if (sb->s_op->alloc_inode)
+	if (sb->s_op->alloc_inode) {
 		inode = sb->s_op->alloc_inode(sb);
-	else
-		inode = (struct inode *) kmem_cache_alloc(inode_cachep, GFP_KERNEL);
+	} else {
+		inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
+		if (inode)
+			inode_init_once(inode);
+	}
 
 	if (inode) {
 		struct address_space * const mapping = &inode->i_data;
 
 		inode->i_sb = sb;
 		inode->i_blkbits = sb->s_blocksize_bits;
-		inode->i_flags = 0;
 		atomic_set(&inode->i_count, 1);
 		inode->i_op = &empty_iops;
 		inode->i_fop = &empty_fops;
 		inode->i_nlink = 1;
 		atomic_set(&inode->i_writecount, 0);
-		inode->i_size = 0;
-		inode->i_blocks = 0;
-		inode->i_bytes = 0;
-		inode->i_generation = 0;
-#ifdef CONFIG_QUOTA
-		memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
-#endif
-		inode->i_pipe = NULL;
-		inode->i_bdev = NULL;
-		inode->i_cdev = NULL;
-		inode->i_rdev = 0;
-		inode->dirtied_when = 0;
 		if (security_inode_alloc(inode)) {
 			if (inode->i_sb->s_op->destroy_inode)
 				inode->i_sb->s_op->destroy_inode(inode);
@@ -158,15 +148,13 @@ static struct inode *alloc_inode(struct 
 		lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
 
 		init_rwsem(&inode->i_alloc_sem);
-		lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key);
+		lockdep_set_class(&inode->i_alloc_sem,
+					&sb->s_type->i_alloc_sem_key);
 
 		mapping->a_ops = &empty_aops;
  		mapping->host = inode;
-		mapping->flags = 0;
 		mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
-		mapping->assoc_mapping = NULL;
 		mapping->backing_dev_info = &default_backing_dev_info;
-		mapping->writeback_index = 0;
 
 		/*
 		 * If the block_device provides a backing_dev_info for client
@@ -181,7 +169,6 @@ static struct inode *alloc_inode(struct 
 				bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
 			mapping->backing_dev_info = bdi;
 		}
-		inode->i_private = NULL;
 		inode->i_mapping = mapping;
 	}
 	return inode;
@@ -197,7 +184,6 @@ void destroy_inode(struct inode *inode) 
 		kmem_cache_free(inode_cachep, (inode));
 }
 
-
 /*
  * These are initializations that only need to be done
  * once, because the fields are idempotent across use
@@ -222,16 +208,8 @@ void inode_init_once(struct inode *inode
 	mutex_init(&inode->inotify_mutex);
 #endif
 }
-
 EXPORT_SYMBOL(inode_init_once);
 
-static void init_once(void *foo)
-{
-	struct inode * inode = (struct inode *) foo;
-
-	inode_init_once(inode);
-}
-
 /*
  * inode_lock must be held
  */
@@ -1400,7 +1378,7 @@ void __init inode_init(void)
 					 0,
 					 (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
 					 SLAB_MEM_SPREAD),
-					 init_once);
+					 NULL);
 	register_shrinker(&icache_shrinker);
 
 	/* Hash may have been set up in inode_init_early */
_


Note that this assumes that a filesystem which implements
->alloc_inode() will call inode_init_once() within its ->alloc_inode().

Which means, I think, that we can just remove inode_init_once()
altogether and move its initialisations into alloc_inode() along with
all the existing ones.

What do you think?


I suppose that's all a fun project but doesn't preclude merging your
patchlet.  I'll do that for now ;)

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux