On Sat, Feb 18, 2023 at 2:36 AM Jonathan Derrick <jonathan.derrick@xxxxxxxxx> wrote: > > From: Jon Derrick <jonathan.derrick@xxxxxxxxx> > > If the bitmap space has enough room, size the I/O for the last bitmap > page write to the optimal I/O size for the storage device. The expanded > write is checked that it won't overrun the data or metadata. > > This change helps increase performance by preventing unnecessary > device-side read-mod-writes due to non-atomic write unit sizes. Hi Jonathan Thanks for your patch. Could you explain more about the how the optimal io size can affect the device-side read-mod-writes? Regards Xiao > > Example Intel/Solidigm P5520 > Raid10, Chunk-size 64M, bitmap-size 57228 bits > > $ mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/nvme{0,1,2,3}n1 --assume-clean --bitmap=internal --bitmap-chunk=64M > $ fio --name=test --direct=1 --filename=/dev/md0 --rw=randwrite --bs=4k --runtime=60 > > Without patch: > write: IOPS=1676, BW=6708KiB/s (6869kB/s)(393MiB/60001msec); 0 zone resets > > With patch: > write: IOPS=15.7k, BW=61.4MiB/s (64.4MB/s)(3683MiB/60001msec); 0 zone resets > > Biosnoop: > Without patch: > Time Process PID Device LBA Size Lat > 1.410377 md0_raid10 6900 nvme0n1 W 16 4096 0.02 > 1.410387 md0_raid10 6900 nvme2n1 W 16 4096 0.02 > 1.410374 md0_raid10 6900 nvme3n1 W 16 4096 0.01 > 1.410381 md0_raid10 6900 nvme1n1 W 16 4096 0.02 > 1.410411 md0_raid10 6900 nvme1n1 W 115346512 4096 0.01 > 1.410418 md0_raid10 6900 nvme0n1 W 115346512 4096 0.02 > 1.410915 md0_raid10 6900 nvme2n1 W 24 3584 0.43 > 1.410935 md0_raid10 6900 nvme3n1 W 24 3584 0.45 > 1.411124 md0_raid10 6900 nvme1n1 W 24 3584 0.64 > 1.411147 md0_raid10 6900 nvme0n1 W 24 3584 0.66 > 1.411176 md0_raid10 6900 nvme3n1 W 2019022184 4096 0.01 > 1.411189 md0_raid10 6900 nvme2n1 W 2019022184 4096 0.02 > > With patch: > Time Process PID Device LBA Size Lat > 5.747193 md0_raid10 727 nvme0n1 W 16 4096 0.01 > 5.747192 md0_raid10 727 nvme1n1 W 16 4096 0.02 > 5.747195 md0_raid10 727 nvme3n1 W 16 4096 0.01 > 5.747202 md0_raid10 727 nvme2n1 W 16 4096 0.02 > 5.747229 md0_raid10 727 nvme3n1 W 1196223704 4096 0.02 > 5.747224 md0_raid10 727 nvme0n1 W 1196223704 4096 0.01 > 5.747279 md0_raid10 727 nvme0n1 W 24 4096 0.01 > 5.747279 md0_raid10 727 nvme1n1 W 24 4096 0.02 > 5.747284 md0_raid10 727 nvme3n1 W 24 4096 0.02 > 5.747291 md0_raid10 727 nvme2n1 W 24 4096 0.02 > 5.747314 md0_raid10 727 nvme2n1 W 2234636712 4096 0.01 > 5.747317 md0_raid10 727 nvme1n1 W 2234636712 4096 0.02 > > Signed-off-by: Jon Derrick <jonathan.derrick@xxxxxxxxx> > --- > v1->v2: Add more information to commit message > > drivers/md/md-bitmap.c | 27 ++++++++++++++++++--------- > 1 file changed, 18 insertions(+), 9 deletions(-) > > diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c > index e7cc6ba1b657..569297ea9b99 100644 > --- a/drivers/md/md-bitmap.c > +++ b/drivers/md/md-bitmap.c > @@ -220,6 +220,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) > rdev = NULL; > while ((rdev = next_active_rdev(rdev, mddev)) != NULL) { > int size = PAGE_SIZE; > + int optimal_size = PAGE_SIZE; > loff_t offset = mddev->bitmap_info.offset; > > bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev; > @@ -228,9 +229,14 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) > int last_page_size = store->bytes & (PAGE_SIZE-1); > if (last_page_size == 0) > last_page_size = PAGE_SIZE; > - size = roundup(last_page_size, > - bdev_logical_block_size(bdev)); > + size = roundup(last_page_size, bdev_logical_block_size(bdev)); > + if (bdev_io_opt(bdev) > bdev_logical_block_size(bdev)) > + optimal_size = roundup(last_page_size, bdev_io_opt(bdev)); > + else > + optimal_size = size; > } > + > + > /* Just make sure we aren't corrupting data or > * metadata > */ > @@ -246,9 +252,11 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) > goto bad_alignment; > } else if (offset < 0) { > /* DATA BITMAP METADATA */ > - if (offset > - + (long)(page->index * (PAGE_SIZE/512)) > - + size/512 > 0) > + loff_t off = offset + (long)(page->index * (PAGE_SIZE/512)); > + if (size != optimal_size && > + off + optimal_size/512 <= 0) > + size = optimal_size; > + else if (off + size/512 > 0) > /* bitmap runs in to metadata */ > goto bad_alignment; > if (rdev->data_offset + mddev->dev_sectors > @@ -257,10 +265,11 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) > goto bad_alignment; > } else if (rdev->sb_start < rdev->data_offset) { > /* METADATA BITMAP DATA */ > - if (rdev->sb_start > - + offset > - + page->index*(PAGE_SIZE/512) + size/512 > - > rdev->data_offset) > + loff_t off = rdev->sb_start + offset + page->index*(PAGE_SIZE/512); > + if (size != optimal_size && > + off + optimal_size/512 <= rdev->data_offset) > + size = optimal_size; > + else if (off + size/512 > rdev->data_offset) > /* bitmap runs in to data */ > goto bad_alignment; > } else { > -- > 2.27.0 >