Re: [PATCH] ext4: add an interface to load block bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 15, 2018, at 7:06 AM, Wang Shilong <wangshilong1991@xxxxxxxxx> wrote:
> 
> From: Wang Shilong <wshilong@xxxxxxx>
> 
> During our benchmarking, we found sometimes writing
> performances are not stable enough and there are some

s/performances/performance/

> small read during write which could drop throughput(~30%).
> 
>  It turned out that block bitmaps loading could make

s/make/add/

> some latency here,also for a heavy fragmented filesystem,
> we might need load many bitmaps to find some free blocks.
> 
>  To improve above situation, we had a patch to load block

s/had/have/

> bitmaps to memory and pin those bitmaps memory until umount

... in memory ...

> or we release the memory on purpose, this could stable write

... on purpose.  This can stabilize write

> performances and improve performances of a heavy fragmented

s/performances/performance/g   s/heavy/heavily/

> filesystem.
> 
> Tested-by: Shuichi Ihara <sihara@xxxxxxx>
> Signed-off-by: Wang Shilong <wshilong@xxxxxxx>
> ---
> fs/ext4/balloc.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/ext4/ext4.h   |  12 +++++++
> fs/ext4/super.c  |   3 ++
> fs/ext4/sysfs.c  |  26 ++++++++++++++
> 4 files changed, 146 insertions(+)
> 
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index b00481c..ceb63e8 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -505,6 +505,8 @@ int ext4_wait_block_bitmap(struct super_block *sb, ext4_group_t block_group,
> 					EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> 		return -EIO;
> 	}
> +	/* race is fine */
> +	EXT4_SB(sb)->bbitmaps_read_cnt++;
> 	clear_buffer_new(bh);
> 	/* Panic or remount fs read-only if block bitmap is invalid */
> 	return ext4_validate_block_bitmap(sb, desc, block_group, bh);
> @@ -660,6 +662,109 @@ ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle,
> +int ext4_load_block_bitmaps_bh(struct super_block *sb, unsigned int op)

This should take a named enum as parameter instead of "unsigned int".

It would probably be better to integrate this function into the existing
group descriptor handling in ext4_mb_init_backend->ext4_mb_add_groupinfo()
since it is already iterating all of the group descriptors, and it also
handles the case of groups being added by online filesystem resizing.

> +{
> +	struct buffer_head *bitmap_bh;
> +	struct ext4_group_desc *gdp;
> +	ext4_group_t i, j;
> +	ext4_group_t ngroups = ext4_get_groups_count(sb);
> +	ext4_group_t cnt = 0;
> +
> +	if (op < EXT4_LOAD_BBITMAPS || op > EXT4_PIN_BBITMAPS)
> +		return -EINVAL;
> +
> +	mutex_lock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> +	/* don't pin bitmaps several times */
> +	if (EXT4_SB(sb)->s_load_bbitmaps == EXT4_PIN_BBITMAPS) {
> +		mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> +		return 0;
> +	}
> +
> +	for (i = 0; i < ngroups; i++) {
> +		gdp = ext4_get_group_desc(sb, i, NULL);
> +		if (!gdp)
> +			continue;
> +		/* Load is simple, we could tolerate any
> +		 * errors and continue to handle, but for
> +		 * pin we return directly for simple handling
> +		 * in unpin codes, otherwiese we need remember

(typo) s/otherwiese/otherwise/ ... we need to remember

> +		 * which block bitmaps we pin exactly.
> +		 */

It should be straightforward to track the pinned bitmaps, by referencing
the buffer head in ext4_group_info with a new field like bb_block_bh.
That would allow the bitmaps to be pinned, but if there is some error
loading one of them it doesn't prevent the filesystem from being mounted.
I don't think that an error pinning the bitmaps should be a mount failure,
since this could happen if the system is short of memory for some reason.

It would be possible to check in ext4_read_block_bitmap() if the current
goal is to pin the bitmap that isn't loaded then a refcount is added to
bb_block_bh.

> +		bitmap_bh = ext4_read_block_bitmap(sb, i);
> +		if (IS_ERR(bitmap_bh)) {
> +			if (op == EXT4_LOAD_BBITMAPS)
> +				continue;
> +			else

(style) no need for "else" after "continue"

> +				goto failed;
> +		}
> +		if (op == EXT4_LOAD_BBITMAPS)
> +			brelse(bitmap_bh);
> +		cnt++;
> +	}
> +	/* Reset block bitmap to zero now */
> +	EXT4_SB(sb)->bbitmaps_read_cnt = 0;

> +	ext4_msg(sb, KERN_INFO, "%s %u block bitmaps finished",
> +		 op == EXT4_PIN_BBITMAPS ? "pin" : "load", cnt);
> +	EXT4_SB(sb)->s_load_bbitmaps = EXT4_PIN_BBITMAPS;
> +	mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> +
> +	return 0;
> +failed:
> +	for (j = 0; j < i; j++) {
> +		gdp = ext4_get_group_desc(sb, i, NULL);
> +		if (!gdp)
> +			continue;
> +		bitmap_bh = ext4_read_block_bitmap(sb, i);
> +		if (!IS_ERR(bitmap_bh)) {
> +			brelse(bitmap_bh);
> +			brelse(bitmap_bh);
> +		}
> +	}
> +	mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> +	return PTR_ERR(bitmap_bh);
> +}
> +
> +void ext4_unpin_block_bitmaps_bh(struct super_block *sb)
> +{
> +	struct buffer_head *bitmap_bh;
> +	struct ext4_group_desc *gdp;
> +	ext4_group_t i;
> +	ext4_group_t ngroups = ext4_get_groups_count(sb);
> +	ext4_group_t cnt = 0;
> +
> +	mutex_lock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> +	if (EXT4_SB(sb)->s_load_bbitmaps == EXT4_UNPIN_BBITMAPS) {
> +		mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> +		return;
> +	}
> +
> +	ext4_msg(sb, KERN_INFO,
> +		 "Read block block bitmaps: %lu afer %s",

(typo) s/afer/after/

> +		 EXT4_SB(sb)->bbitmaps_read_cnt,
> +		 EXT4_SB(sb)->s_load_bbitmaps == EXT4_PIN_BBITMAPS ?
> +		 "pin" : "load");
> +
> +	if (EXT4_SB(sb)->s_load_bbitmaps != EXT4_PIN_BBITMAPS) {
> +		mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> +		return;
> +	}
> +
> +	for (i = 0; i < ngroups; i++) {
> +		gdp = ext4_get_group_desc(sb, i, NULL);

This should be changed to:

		struct ext4_group_info *grinfo = ext4_get_group_info(sb, i);

and then only brelse the bufferhead if it was pinned properly:

		if (!IS_ERR(grinfo->bb_block_bh))
			brelse(grinfo->bb_block_bh);
		grinfo->bb_block_bh = NULL;

In theory we could also optimize ext4_read_block_bitmap() to add a
reference to bb_block_bh directly instead of the extra work done there.

> +		if (!gdp)
> +			continue;
> +		bitmap_bh = ext4_read_block_bitmap(sb, i);
> +		if (IS_ERR(bitmap_bh))
> +			continue;
> +		brelse(bitmap_bh);
> +		brelse(bitmap_bh);
> +		cnt++;
> +	}
> +	ext4_msg(sb, KERN_INFO, "Unpin %u lock bitmaps finished", cnt);
> +	EXT4_SB(sb)->s_load_bbitmaps = EXT4_UNPIN_BBITMAPS;
> +	mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> +}
> +
> /**
>  * ext4_count_free_clusters() -- count filesystem free clusters
>  * @sb:		superblock
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index fa52b7d..4f9ee73 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1317,6 +1317,12 @@ struct ext4_super_block {
> /* Number of quota types we support */
> #define EXT4_MAXQUOTAS 3
> 
> +enum {

This should be a named enum, like ext4_bitmap_state.

> +	EXT4_UNPIN_BBITMAPS = 0,
> +	EXT4_LOAD_BBITMAPS,
> +	EXT4_PIN_BBITMAPS,

Since these values are exported to userspace, they should have specific
values assigned.  Also, we should consider the case of being able to
load/pin inode bitmaps.  Something like:

enum ext4_bitmap_state {
	EXT4_UNPIN_BITMAPS	= 0x00,
	EXT4_LOAD_BBITMAPS	= 0x01,
	EXT4_PIN_BBITMAPS	= 0x02,
	EXT4_LOAD_IBITMAPS	= 0x04,
	EXT4_PIN_IBITMAPS	= 0x08,
};

> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index 9ebd26c..89396b3f 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
> 
> +static ssize_t load_bbitmaps_store(struct ext4_sb_info *sbi,
> +				   const char *buf, size_t count)

I'd suggest to take "b" (block) out of the name here so that this
can also be used for inode bitmaps in the future.

> +{
> +	unsigned long long val;
> +	int ret;
> +
> +	ret = kstrtoull(skip_spaces(buf), 0, &val);
> +	if (ret || val > EXT4_PIN_BBITMAPS)
> +		return -EINVAL;
> +
> +	if (val == EXT4_UNPIN_BBITMAPS)
> +		ext4_unpin_block_bitmaps_bh(sbi->s_sb);
> +	else if (val > EXT4_UNPIN_BBITMAPS)
> +		ret = ext4_load_block_bitmaps_bh(sbi->s_sb, val);

This is not a great userspace interface, since the EXT4_*_BBITMAPS
values don't have specific values.  It would be nicer to accept
string names like "block" or "block-pin" or "inode" or "inode-pin",
and "unpin", but at least if there are fixed numbers (something like
"echo 6 > /sys/fs/ext4/sda/load_bitmaps" to load both bitmaps and
also pin the block bitmaps) the userspace interface won't change.

> +	return ret ? ret : count;
> +}
> 
> @@ -270,6 +291,9 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
> 		return snprintf(buf, PAGE_SIZE, "%llu\n",
> 				(unsigned long long)
> 				atomic64_read(&sbi->s_resv_clusters));
> +	case attr_load_bbitmaps:
> +		return snprintf(buf, PAGE_SIZE, "%u\n",
> +				sbi->s_load_bbitmaps);

This should use "%#x" so that we know which flags are set.

Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux