Re: [PATCH 3/3 v5] ext4: reduce lock contention in __ext4_new_inode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun 20-08-17 19:54:37, Wang Shilong wrote:
> From: Wang Shilong <wshilong@xxxxxxx>
> 
> While running number of creating file threads concurrently,
> we found heavy lock contention on group spinlock:
> 
> FUNC                           TOTAL_TIME(us)       COUNT        AVG(us)
> ext4_create                    1707443399           1440000      1185.72
> _raw_spin_lock                 1317641501           180899929    7.28
> jbd2__journal_start            287821030            1453950      197.96
> jbd2_journal_get_write_access  33441470             73077185     0.46
> ext4_add_nondir                29435963             1440000      20.44
> ext4_add_entry                 26015166             1440049      18.07
> ext4_dx_add_entry              25729337             1432814      17.96
> ext4_mark_inode_dirty          12302433             5774407      2.13
> 
> most of cpu time blames to _raw_spin_lock, here is some testing
> numbers with/without patch.
> 
> Test environment:
> Server : SuperMicro Sever (2 x E5-2690 v3@2.60GHz, 128GB 2133MHz
>          DDR4 Memory, 8GbFC)
> Storage : 2 x RAID1 (DDN SFA7700X, 4 x Toshiba PX02SMU020 200GB
>           Read Intensive SSD)
> 
> format command:
>         mkfs.ext4 -J size=4096
> 
> test command:
>         mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
>                 -r -i 1 -v -p 10 -u #first run to load inode
> 
>         mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
>                 -r -i 3 -v -p 10 -u
> 
> Kernel version: 4.13.0-rc3
> 
> Test  1,440,000 files with 48 directories by 48 processes:
> 
> Without patch:
> 
> File Creation   File removal
> 79,033          289,569 ops/per second
> 81,463          285,359
> 79,875          288,475
> 
> With patch:
> File Creation   File removal
> 810669		301694
> 812805		302711
> 813965		297670
> 
> Creation performance is improved more than 10X with large
> journal size. The main problem here is we test bitmap
> and do some check and journal operations which could be
> slept, then we test and set with lock hold, this could
> be racy, and make 'inode' steal by other process.
> 
> However, after first try, we could confirm handle has
> been started and inode bitmap journaled too, then
> we could find and set bit with lock hold directly, this
> will mostly gurateee success with second try.
> 
> Tested-by: Shuichi Ihara <sihara@xxxxxxx>
> Signed-off-by: Wang Shilong <wshilong@xxxxxxx>

I like the result! You can add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

								Honza

> ---
> v4->v5: better refactor codes as Jan Kara suggested.
> ---
>  fs/ext4/ialloc.c | 50 +++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 39 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 9e6eb1c..fb83a36 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -730,6 +730,27 @@ static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino)
>  	return ret;
>  }
>  
> +static int find_inode_bit(struct super_block *sb, ext4_group_t group,
> +			  struct buffer_head *bitmap, unsigned long *ino)
> +{
> +next:
> +	*ino = ext4_find_next_zero_bit((unsigned long *)
> +				       bitmap->b_data,
> +				       EXT4_INODES_PER_GROUP(sb), *ino);
> +	if (*ino >= EXT4_INODES_PER_GROUP(sb))
> +		return 0;
> +
> +	if ((EXT4_SB(sb)->s_journal == NULL) &&
> +	    recently_deleted(sb, group, *ino)) {
> +		*ino = *ino + 1;
> +		if (*ino < EXT4_INODES_PER_GROUP(sb))
> +			goto next;
> +		return 0;
> +	}
> +
> +	return 1;
> +}
> +
>  /*
>   * There are two policies for allocating an inode.  If the new inode is
>   * a directory, then a forward search is made for a block group with both
> @@ -910,21 +931,16 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
>  		}
>  
>  repeat_in_this_group:
> -		ino = ext4_find_next_zero_bit((unsigned long *)
> -					      inode_bitmap_bh->b_data,
> -					      EXT4_INODES_PER_GROUP(sb), ino);
> -		if (ino >= EXT4_INODES_PER_GROUP(sb))
> +		ret2 = find_inode_bit(sb, group, inode_bitmap_bh, &ino);
> +		if (!ret2)
>  			goto next_group;
> -		if (group == 0 && (ino+1) < EXT4_FIRST_INO(sb)) {
> +
> +		if (group == 0 && (ino + 1) < EXT4_FIRST_INO(sb)) {
>  			ext4_error(sb, "reserved inode found cleared - "
>  				   "inode=%lu", ino + 1);
>  			goto next_group;
>  		}
> -		if ((EXT4_SB(sb)->s_journal == NULL) &&
> -		    recently_deleted(sb, group, ino)) {
> -			ino++;
> -			goto next_inode;
> -		}
> +
>  		if (!handle) {
>  			BUG_ON(nblocks <= 0);
>  			handle = __ext4_journal_start_sb(dir->i_sb, line_no,
> @@ -944,11 +960,23 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
>  		}
>  		ext4_lock_group(sb, group);
>  		ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
> +		if (ret2) {
> +			/* Someone already took the bit. Repeat the search
> +			 * with lock held.
> +			 */
> +			ret2 = find_inode_bit(sb, group, inode_bitmap_bh, &ino);
> +			if (ret2) {
> +				ext4_set_bit(ino, inode_bitmap_bh->b_data);
> +				ret2 = 0;
> +			} else {
> +				ret2 = 1; /* we didn't grab the inode */
> +			}
> +		}
>  		ext4_unlock_group(sb, group);
>  		ino++;		/* the inode bitmap is zero-based */
>  		if (!ret2)
>  			goto got; /* we grabbed the inode! */
> -next_inode:
> +
>  		if (ino < EXT4_INODES_PER_GROUP(sb))
>  			goto repeat_in_this_group;
>  next_group:
> -- 
> 2.9.3
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux