Re: [PATCH] md: Use optimal I/O size for last bitmap page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/10/2023 10:32 AM, Song Liu wrote:
> Hi Jonathan,
> 
> On Thu, Feb 9, 2023 at 12:38 PM Jonathan Derrick
> <jonathan.derrick@xxxxxxxxx> wrote:
>>
>> Hi Song,
>>
>> Any thoughts on this?
> 
> I am really sorry that I missed this patch.
> 
>>
>> On 1/17/2023 5:53 PM, Jonathan Derrick wrote:
>>> From: Jon Derrick <jonathan.derrick@xxxxxxxxx>
>>>
>>> If the bitmap space has enough room, size the I/O for the last bitmap
>>> page write to the optimal I/O size for the storage device. The expanded
>>> write is checked that it won't overrun the data or metadata.
>>>
>>> This change helps increase performance by preventing unnecessary
>>> device-side read-mod-writes due to non-atomic write unit sizes.
>>>
>>> Ex biosnoop log. Device lba size 512, optimal size 4k:
>>> Before:
>>> Time        Process        PID     Device      LBA        Size      Lat
>>> 0.843734    md0_raid10     5267    nvme0n1   W 24         3584      1.17
>>> 0.843933    md0_raid10     5267    nvme1n1   W 24         3584      1.36
>>> 0.843968    md0_raid10     5267    nvme1n1   W 14207939968 4096      0.01
>>> 0.843979    md0_raid10     5267    nvme0n1   W 14207939968 4096      0.02
>>>
>>> After:
>>> Time        Process        PID     Device      LBA        Size      Lat
>>> 18.374244   md0_raid10     6559    nvme0n1   W 24         4096      0.01
>>> 18.374253   md0_raid10     6559    nvme1n1   W 24         4096      0.01
>>> 18.374300   md0_raid10     6559    nvme0n1   W 11020272296 4096      0.01
>>> 18.374306   md0_raid10     6559    nvme1n1   W 11020272296 4096      0.02
> 
> Do we see significant improvements from io benchmarks?
Yes. With lbaf=512, optimal i/o size=4k:

Without patch:
  write: IOPS=1570, BW=6283KiB/s (6434kB/s)(368MiB/60001msec); 0 zone resets
With patch:
  write: IOPS=59.7k, BW=233MiB/s (245MB/s)(13.7GiB/60001msec); 0 zone resets

It's going to be different for different drives, but this was a drive where a
drive-side read-mod-write has huge penalty.

> 
> IIUC, fewer future HDDs will use 512B LBA size.We probably don't need
> such optimizations in the future.
Maybe. But many drives still ship with 512B formatting as default.

> 
> Thanks,
> Song
> 
>>>
>>> Signed-off-by: Jon Derrick <jonathan.derrick@xxxxxxxxx>
>>> ---
>>>  drivers/md/md-bitmap.c | 27 ++++++++++++++++++---------
>>>  1 file changed, 18 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
>>> index e7cc6ba1b657..569297ea9b99 100644
>>> --- a/drivers/md/md-bitmap.c
>>> +++ b/drivers/md/md-bitmap.c
>>> @@ -220,6 +220,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
>>>       rdev = NULL;
>>>       while ((rdev = next_active_rdev(rdev, mddev)) != NULL) {
>>>               int size = PAGE_SIZE;
>>> +             int optimal_size = PAGE_SIZE;
>>>               loff_t offset = mddev->bitmap_info.offset;
>>>
>>>               bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev;
>>> @@ -228,9 +229,14 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
>>>                       int last_page_size = store->bytes & (PAGE_SIZE-1);
>>>                       if (last_page_size == 0)
>>>                               last_page_size = PAGE_SIZE;
>>> -                     size = roundup(last_page_size,
>>> -                                    bdev_logical_block_size(bdev));
>>> +                     size = roundup(last_page_size, bdev_logical_block_size(bdev));
>>> +                     if (bdev_io_opt(bdev) > bdev_logical_block_size(bdev))
>>> +                             optimal_size = roundup(last_page_size, bdev_io_opt(bdev));
>>> +                     else
>>> +                             optimal_size = size;
>>>               }
>>> +
>>> +
>>>               /* Just make sure we aren't corrupting data or
>>>                * metadata
>>>                */
>>> @@ -246,9 +252,11 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
>>>                               goto bad_alignment;
>>>               } else if (offset < 0) {
>>>                       /* DATA  BITMAP METADATA  */
>>> -                     if (offset
>>> -                         + (long)(page->index * (PAGE_SIZE/512))
>>> -                         + size/512 > 0)
>>> +                     loff_t off = offset + (long)(page->index * (PAGE_SIZE/512));
>>> +                     if (size != optimal_size &&
>>> +                         off + optimal_size/512 <= 0)
>>> +                             size = optimal_size;
>>> +                     else if (off + size/512 > 0)
>>>                               /* bitmap runs in to metadata */
>>>                               goto bad_alignment;
>>>                       if (rdev->data_offset + mddev->dev_sectors
>>> @@ -257,10 +265,11 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
>>>                               goto bad_alignment;
>>>               } else if (rdev->sb_start < rdev->data_offset) {
>>>                       /* METADATA BITMAP DATA */
>>> -                     if (rdev->sb_start
>>> -                         + offset
>>> -                         + page->index*(PAGE_SIZE/512) + size/512
>>> -                         > rdev->data_offset)
>>> +                     loff_t off = rdev->sb_start + offset + page->index*(PAGE_SIZE/512);
>>> +                     if (size != optimal_size &&
>>> +                         off + optimal_size/512 <= rdev->data_offset)
>>> +                             size = optimal_size;
>>> +                     else if (off + size/512 > rdev->data_offset)
>>>                               /* bitmap runs in to data */
>>>                               goto bad_alignment;
>>>               } else {



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux