On May 15, 2018, at 7:06 AM, Wang Shilong <wangshilong1991@xxxxxxxxx> wrote: > > From: Wang Shilong <wshilong@xxxxxxx> > > During our benchmarking, we found sometimes writing > performances are not stable enough and there are some s/performances/performance/ > small read during write which could drop throughput(~30%). > > It turned out that block bitmaps loading could make s/make/add/ > some latency here,also for a heavy fragmented filesystem, > we might need load many bitmaps to find some free blocks. > > To improve above situation, we had a patch to load block s/had/have/ > bitmaps to memory and pin those bitmaps memory until umount ... in memory ... > or we release the memory on purpose, this could stable write ... on purpose. This can stabilize write > performances and improve performances of a heavy fragmented s/performances/performance/g s/heavy/heavily/ > filesystem. > > Tested-by: Shuichi Ihara <sihara@xxxxxxx> > Signed-off-by: Wang Shilong <wshilong@xxxxxxx> > --- > fs/ext4/balloc.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > fs/ext4/ext4.h | 12 +++++++ > fs/ext4/super.c | 3 ++ > fs/ext4/sysfs.c | 26 ++++++++++++++ > 4 files changed, 146 insertions(+) > > diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c > index b00481c..ceb63e8 100644 > --- a/fs/ext4/balloc.c > +++ b/fs/ext4/balloc.c > @@ -505,6 +505,8 @@ int ext4_wait_block_bitmap(struct super_block *sb, ext4_group_t block_group, > EXT4_GROUP_INFO_BBITMAP_CORRUPT); > return -EIO; > } > + /* race is fine */ > + EXT4_SB(sb)->bbitmaps_read_cnt++; > clear_buffer_new(bh); > /* Panic or remount fs read-only if block bitmap is invalid */ > return ext4_validate_block_bitmap(sb, desc, block_group, bh); > @@ -660,6 +662,109 @@ ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, > +int ext4_load_block_bitmaps_bh(struct super_block *sb, unsigned int op) This should take a named enum as parameter instead of "unsigned int". It would probably be better to integrate this function into the existing group descriptor handling in ext4_mb_init_backend->ext4_mb_add_groupinfo() since it is already iterating all of the group descriptors, and it also handles the case of groups being added by online filesystem resizing. > +{ > + struct buffer_head *bitmap_bh; > + struct ext4_group_desc *gdp; > + ext4_group_t i, j; > + ext4_group_t ngroups = ext4_get_groups_count(sb); > + ext4_group_t cnt = 0; > + > + if (op < EXT4_LOAD_BBITMAPS || op > EXT4_PIN_BBITMAPS) > + return -EINVAL; > + > + mutex_lock(&EXT4_SB(sb)->s_load_bbitmaps_lock); > + /* don't pin bitmaps several times */ > + if (EXT4_SB(sb)->s_load_bbitmaps == EXT4_PIN_BBITMAPS) { > + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock); > + return 0; > + } > + > + for (i = 0; i < ngroups; i++) { > + gdp = ext4_get_group_desc(sb, i, NULL); > + if (!gdp) > + continue; > + /* Load is simple, we could tolerate any > + * errors and continue to handle, but for > + * pin we return directly for simple handling > + * in unpin codes, otherwiese we need remember (typo) s/otherwiese/otherwise/ ... we need to remember > + * which block bitmaps we pin exactly. > + */ It should be straightforward to track the pinned bitmaps, by referencing the buffer head in ext4_group_info with a new field like bb_block_bh. That would allow the bitmaps to be pinned, but if there is some error loading one of them it doesn't prevent the filesystem from being mounted. I don't think that an error pinning the bitmaps should be a mount failure, since this could happen if the system is short of memory for some reason. It would be possible to check in ext4_read_block_bitmap() if the current goal is to pin the bitmap that isn't loaded then a refcount is added to bb_block_bh. > + bitmap_bh = ext4_read_block_bitmap(sb, i); > + if (IS_ERR(bitmap_bh)) { > + if (op == EXT4_LOAD_BBITMAPS) > + continue; > + else (style) no need for "else" after "continue" > + goto failed; > + } > + if (op == EXT4_LOAD_BBITMAPS) > + brelse(bitmap_bh); > + cnt++; > + } > + /* Reset block bitmap to zero now */ > + EXT4_SB(sb)->bbitmaps_read_cnt = 0; > + ext4_msg(sb, KERN_INFO, "%s %u block bitmaps finished", > + op == EXT4_PIN_BBITMAPS ? "pin" : "load", cnt); > + EXT4_SB(sb)->s_load_bbitmaps = EXT4_PIN_BBITMAPS; > + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock); > + > + return 0; > +failed: > + for (j = 0; j < i; j++) { > + gdp = ext4_get_group_desc(sb, i, NULL); > + if (!gdp) > + continue; > + bitmap_bh = ext4_read_block_bitmap(sb, i); > + if (!IS_ERR(bitmap_bh)) { > + brelse(bitmap_bh); > + brelse(bitmap_bh); > + } > + } > + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock); > + return PTR_ERR(bitmap_bh); > +} > + > +void ext4_unpin_block_bitmaps_bh(struct super_block *sb) > +{ > + struct buffer_head *bitmap_bh; > + struct ext4_group_desc *gdp; > + ext4_group_t i; > + ext4_group_t ngroups = ext4_get_groups_count(sb); > + ext4_group_t cnt = 0; > + > + mutex_lock(&EXT4_SB(sb)->s_load_bbitmaps_lock); > + if (EXT4_SB(sb)->s_load_bbitmaps == EXT4_UNPIN_BBITMAPS) { > + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock); > + return; > + } > + > + ext4_msg(sb, KERN_INFO, > + "Read block block bitmaps: %lu afer %s", (typo) s/afer/after/ > + EXT4_SB(sb)->bbitmaps_read_cnt, > + EXT4_SB(sb)->s_load_bbitmaps == EXT4_PIN_BBITMAPS ? > + "pin" : "load"); > + > + if (EXT4_SB(sb)->s_load_bbitmaps != EXT4_PIN_BBITMAPS) { > + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock); > + return; > + } > + > + for (i = 0; i < ngroups; i++) { > + gdp = ext4_get_group_desc(sb, i, NULL); This should be changed to: struct ext4_group_info *grinfo = ext4_get_group_info(sb, i); and then only brelse the bufferhead if it was pinned properly: if (!IS_ERR(grinfo->bb_block_bh)) brelse(grinfo->bb_block_bh); grinfo->bb_block_bh = NULL; In theory we could also optimize ext4_read_block_bitmap() to add a reference to bb_block_bh directly instead of the extra work done there. > + if (!gdp) > + continue; > + bitmap_bh = ext4_read_block_bitmap(sb, i); > + if (IS_ERR(bitmap_bh)) > + continue; > + brelse(bitmap_bh); > + brelse(bitmap_bh); > + cnt++; > + } > + ext4_msg(sb, KERN_INFO, "Unpin %u lock bitmaps finished", cnt); > + EXT4_SB(sb)->s_load_bbitmaps = EXT4_UNPIN_BBITMAPS; > + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock); > +} > + > /** > * ext4_count_free_clusters() -- count filesystem free clusters > * @sb: superblock > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index fa52b7d..4f9ee73 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -1317,6 +1317,12 @@ struct ext4_super_block { > /* Number of quota types we support */ > #define EXT4_MAXQUOTAS 3 > > +enum { This should be a named enum, like ext4_bitmap_state. > + EXT4_UNPIN_BBITMAPS = 0, > + EXT4_LOAD_BBITMAPS, > + EXT4_PIN_BBITMAPS, Since these values are exported to userspace, they should have specific values assigned. Also, we should consider the case of being able to load/pin inode bitmaps. Something like: enum ext4_bitmap_state { EXT4_UNPIN_BITMAPS = 0x00, EXT4_LOAD_BBITMAPS = 0x01, EXT4_PIN_BBITMAPS = 0x02, EXT4_LOAD_IBITMAPS = 0x04, EXT4_PIN_IBITMAPS = 0x08, }; > diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c > index 9ebd26c..89396b3f 100644 > --- a/fs/ext4/sysfs.c > +++ b/fs/ext4/sysfs.c > > +static ssize_t load_bbitmaps_store(struct ext4_sb_info *sbi, > + const char *buf, size_t count) I'd suggest to take "b" (block) out of the name here so that this can also be used for inode bitmaps in the future. > +{ > + unsigned long long val; > + int ret; > + > + ret = kstrtoull(skip_spaces(buf), 0, &val); > + if (ret || val > EXT4_PIN_BBITMAPS) > + return -EINVAL; > + > + if (val == EXT4_UNPIN_BBITMAPS) > + ext4_unpin_block_bitmaps_bh(sbi->s_sb); > + else if (val > EXT4_UNPIN_BBITMAPS) > + ret = ext4_load_block_bitmaps_bh(sbi->s_sb, val); This is not a great userspace interface, since the EXT4_*_BBITMAPS values don't have specific values. It would be nicer to accept string names like "block" or "block-pin" or "inode" or "inode-pin", and "unpin", but at least if there are fixed numbers (something like "echo 6 > /sys/fs/ext4/sda/load_bitmaps" to load both bitmaps and also pin the block bitmaps) the userspace interface won't change. > + return ret ? ret : count; > +} > > @@ -270,6 +291,9 @@ static ssize_t ext4_attr_show(struct kobject *kobj, > return snprintf(buf, PAGE_SIZE, "%llu\n", > (unsigned long long) > atomic64_read(&sbi->s_resv_clusters)); > + case attr_load_bbitmaps: > + return snprintf(buf, PAGE_SIZE, "%u\n", > + sbi->s_load_bbitmaps); This should use "%#x" so that we know which flags are set. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP