On 2/10/2023 10:32 AM, Song Liu wrote: > Hi Jonathan, > > On Thu, Feb 9, 2023 at 12:38 PM Jonathan Derrick > <jonathan.derrick@xxxxxxxxx> wrote: >> >> Hi Song, >> >> Any thoughts on this? > > I am really sorry that I missed this patch. > >> >> On 1/17/2023 5:53 PM, Jonathan Derrick wrote: >>> From: Jon Derrick <jonathan.derrick@xxxxxxxxx> >>> >>> If the bitmap space has enough room, size the I/O for the last bitmap >>> page write to the optimal I/O size for the storage device. The expanded >>> write is checked that it won't overrun the data or metadata. >>> >>> This change helps increase performance by preventing unnecessary >>> device-side read-mod-writes due to non-atomic write unit sizes. >>> >>> Ex biosnoop log. Device lba size 512, optimal size 4k: >>> Before: >>> Time Process PID Device LBA Size Lat >>> 0.843734 md0_raid10 5267 nvme0n1 W 24 3584 1.17 >>> 0.843933 md0_raid10 5267 nvme1n1 W 24 3584 1.36 >>> 0.843968 md0_raid10 5267 nvme1n1 W 14207939968 4096 0.01 >>> 0.843979 md0_raid10 5267 nvme0n1 W 14207939968 4096 0.02 >>> >>> After: >>> Time Process PID Device LBA Size Lat >>> 18.374244 md0_raid10 6559 nvme0n1 W 24 4096 0.01 >>> 18.374253 md0_raid10 6559 nvme1n1 W 24 4096 0.01 >>> 18.374300 md0_raid10 6559 nvme0n1 W 11020272296 4096 0.01 >>> 18.374306 md0_raid10 6559 nvme1n1 W 11020272296 4096 0.02 > > Do we see significant improvements from io benchmarks? Yes. With lbaf=512, optimal i/o size=4k: Without patch: write: IOPS=1570, BW=6283KiB/s (6434kB/s)(368MiB/60001msec); 0 zone resets With patch: write: IOPS=59.7k, BW=233MiB/s (245MB/s)(13.7GiB/60001msec); 0 zone resets It's going to be different for different drives, but this was a drive where a drive-side read-mod-write has huge penalty. > > IIUC, fewer future HDDs will use 512B LBA size.We probably don't need > such optimizations in the future. Maybe. But many drives still ship with 512B formatting as default. > > Thanks, > Song > >>> >>> Signed-off-by: Jon Derrick <jonathan.derrick@xxxxxxxxx> >>> --- >>> drivers/md/md-bitmap.c | 27 ++++++++++++++++++--------- >>> 1 file changed, 18 insertions(+), 9 deletions(-) >>> >>> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c >>> index e7cc6ba1b657..569297ea9b99 100644 >>> --- a/drivers/md/md-bitmap.c >>> +++ b/drivers/md/md-bitmap.c >>> @@ -220,6 +220,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) >>> rdev = NULL; >>> while ((rdev = next_active_rdev(rdev, mddev)) != NULL) { >>> int size = PAGE_SIZE; >>> + int optimal_size = PAGE_SIZE; >>> loff_t offset = mddev->bitmap_info.offset; >>> >>> bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev; >>> @@ -228,9 +229,14 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) >>> int last_page_size = store->bytes & (PAGE_SIZE-1); >>> if (last_page_size == 0) >>> last_page_size = PAGE_SIZE; >>> - size = roundup(last_page_size, >>> - bdev_logical_block_size(bdev)); >>> + size = roundup(last_page_size, bdev_logical_block_size(bdev)); >>> + if (bdev_io_opt(bdev) > bdev_logical_block_size(bdev)) >>> + optimal_size = roundup(last_page_size, bdev_io_opt(bdev)); >>> + else >>> + optimal_size = size; >>> } >>> + >>> + >>> /* Just make sure we aren't corrupting data or >>> * metadata >>> */ >>> @@ -246,9 +252,11 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) >>> goto bad_alignment; >>> } else if (offset < 0) { >>> /* DATA BITMAP METADATA */ >>> - if (offset >>> - + (long)(page->index * (PAGE_SIZE/512)) >>> - + size/512 > 0) >>> + loff_t off = offset + (long)(page->index * (PAGE_SIZE/512)); >>> + if (size != optimal_size && >>> + off + optimal_size/512 <= 0) >>> + size = optimal_size; >>> + else if (off + size/512 > 0) >>> /* bitmap runs in to metadata */ >>> goto bad_alignment; >>> if (rdev->data_offset + mddev->dev_sectors >>> @@ -257,10 +265,11 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) >>> goto bad_alignment; >>> } else if (rdev->sb_start < rdev->data_offset) { >>> /* METADATA BITMAP DATA */ >>> - if (rdev->sb_start >>> - + offset >>> - + page->index*(PAGE_SIZE/512) + size/512 >>> - > rdev->data_offset) >>> + loff_t off = rdev->sb_start + offset + page->index*(PAGE_SIZE/512); >>> + if (size != optimal_size && >>> + off + optimal_size/512 <= rdev->data_offset) >>> + size = optimal_size; >>> + else if (off + size/512 > rdev->data_offset) >>> /* bitmap runs in to data */ >>> goto bad_alignment; >>> } else {