On Tue, Sep 12, 2023 at 02:59:24PM +0800, Bobi Jam wrote: > With LVM it is possible to create an LV with SSD storage at the > beginning of the LV and HDD storage at the end of the LV, and use that > to separate ext4 metadata allocations (that need small random IOs) > from data allocations (that are better suited for large sequential > IOs) depending on the type of underlying storage. Between 0.5-1.0% of > the filesystem capacity would need to be high-IOPS storage in order to > hold all of the internal metadata. > > This would improve performance for inode and other metadata access, > such as ls, find, e2fsck, and in general improve file access latency, > modification, truncate, unlink, transaction commit, etc. > > This patch split largest free order group lists and average fragment > size lists into other two lists for IOPS/fast storage groups, and > cr 0 / cr 1 group scanning for metadata block allocation in following > order: > > if (allocate metadata blocks) > if (cr == 0) > try to find group in largest free order IOPS group list > if (cr == 1) > try to find group in fragment size IOPS group list > if (above two find failed) > fall through normal group lists as before > if (allocate data blocks) > try to find group in normal group lists as before > if (failed to find group in normal group && mb_enable_iops_data) > try to find group in IOPS groups > > Non-metadata block allocation does not allocate from the IOPS groups > if non-IOPS groups are not used up. > > Add for mke2fs an option to mark which blocks are in the IOPS region > of storage at format time: > > -E iops=0-1024G,4096-8192G > > so the ext4 mballoc code can then use the EXT4_BG_IOPS flag in the > group descriptors to decide which groups to allocate dynamic > filesystem metadata. > > Signed-off-by: Bobi Jam <bobijam@xxxxxxxxxxx > > -- > v2->v3: add sysfs mb_enable_iops_data to enable data block allocation > from IOPS groups. > v1->v2: for metadata block allocation, search in IOPS list then normal > list. > --- Hi Bobi, Andreas, So I took a look at this patch and the idea is definitely interesting! I'll add my review comments inline in a separate mail, but just adding some high level observations in this mail: 1. Since most of the times our metadata allocation would only request 1 block, we will actually end up skipping CR_POWER2_ALIGNED (aka CR0) since it only works for len >= 2. But I think it's okay cause some metadata allocaitons like xattrs might benefit from it. 2. We always try the goal group first in ext4_mb_find_by_goal() before going through the mballoc criterias and I dont think there is any logic to stop that incase the goal group is non IOPS and metadata is being allocated. So I think we are relying on the goal finding logic to give us IOPS blocks as goal for metadata, but does it do that currently? Thanks! ojaswin