Re: [PATCH V3] md-linear: optimize which_dev() for small disks number

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> 2025年2月7日 17:50,Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> 写道:
> 
> Hi,
> 
> 在 2025/02/07 10:35, colyli@xxxxxxxxxx 写道:
>> From: Coly Li <colyli@xxxxxxxxxx>
>> which_dev() is a top hot function in md-linear.c, every I/O request will
>> call it to find out which component disk the bio should be issued to.
>> Current witch_dev() performs a standard binary search, indeed this is
>> not the fastest algorithm in practice. When the whole conf->disks array
>> can be stored within a single cache line, simple linear search is faster
>> than binary search for a small disks number.
>>> From micro benchmark, around 20%~30% latency reduction can be observed.
>> Of course such huge optimization cannot be achieved in real workload, in
>> my benchmark with,
>> 1) One md linear device assembled by 2 or 4 Intel Optane memory block
>>    device on Lenovo ThinkSystem SR650 server.
>> 2) Random write I/O issued by fio, with I/O depth 1 and 512 bytes block
>>    size.
> 
>> The percentage of I/O latencies completed with 750 nsec increases from
>> 97.186% to 99.324% in average, in a rough estimation the write latency
>> improves (reduces) around 2.138%.
>> This is quite ideal result, I believe on slow hard drives such small
>> code-running optimization will be overwhelmed by hardware latency and
>> hard to be recognized.
>> This patch will go back to binary search when the linear device grows
>> and conf->disks array cannot be placed within a single cache line.
>> Although the optimization result is tiny in most of cases, it is good to
>> have it since we don't pay any other cost.
>> Signed-off-by: Coly Li <colyli@xxxxxxxxxx>
>> Cc: Song Liu <song@xxxxxxxxxx>
>> Cc: Yu Kuai <yukuai3@xxxxxxxxxx>
>> ---
>> Changelog,
>> v3: fix typo and email address which are reported by raid kernel ci.
>> v2: return last item of conf->disks[] if fast search missed.
>> v1: initial version.
>>  drivers/md/md-linear.c | 45 +++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 44 insertions(+), 1 deletion(-)
>> diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
>> index a382929ce7ba..cdb59f2b2a1c 100644
>> --- a/drivers/md/md-linear.c
>> +++ b/drivers/md/md-linear.c
>> @@ -25,10 +25,12 @@ struct linear_conf {
>>   struct dev_info         disks[] __counted_by(raid_disks);
>>  };
>>  +static int prefer_linear_dev_search;
> 
> Why using global variable instead of per linear? There are still hole in
> the linear_conf() that you can use.

OK, I will use space inside linear_conf. 


>> +
>>  /*
>>   * find which device holds a particular offset
>>   */
>> -static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector)
>> +static inline struct dev_info *__which_dev(struct mddev *mddev, sector_t sector)
>>  {
>>   int lo, mid, hi;
>>   struct linear_conf *conf;
>> @@ -53,6 +55,34 @@ static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector)
>>   return conf->disks + lo;
>>  }
>>  +/*
>> + * If conf->disk[] can be hold within a L1 cache line,
>> + * linear search is fater than binary search.
>> + */
>> +static inline struct dev_info *which_dev(struct mddev *mddev, sector_t sector)
>> +{
>> + int i;
>> +
>> + if (prefer_linear_dev_search) {
>> + struct linear_conf *conf;
>> + struct dev_info *dev;
>> + int max;
>> +
>> + conf = mddev->private;
>> + dev = conf->disks;
>> + max = conf->raid_disks;
>> + for (i = 0; i < max; i++, dev++) {
>> + if (sector < dev->end_sector)
>> + return dev;
>> + }
>> + return &conf->disks[max-1];
>> + }
>> +
>> + /* slow path */
>> + return __which_dev(mddev, sector);
>> +}
>> +
>> +
>>  static sector_t linear_size(struct mddev *mddev, sector_t sectors, int raid_disks)
>>  {
>>   struct linear_conf *conf;
>> @@ -222,6 +252,18 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
>>   md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
>>   set_capacity_and_notify(mddev->gendisk, mddev->array_sectors);
>>   kfree_rcu(oldconf, rcu);
>> +
>> + /*
>> +  * When elements in linear_conf->disks[] becomes large enough,
>> +  * set prefer_linear_dev_search as 0 to indicate linear search
>> +  * in which_dev() is not optimized. Slow path in __which_dev()
>> +  * might be faster.
>> +  */
>> + if ((mddev->raid_disks * sizeof(struct dev_info)) >
>> +      cache_line_size() &&
> 
> BTW, you said in the case *a single cache line*, however, the field
> disks is 32 bytes offset in linear_conf, you might want to align it to
> cacheline, or add sizeof(struct linear_conf) on the left and align
> linear_conf to the cacheline.
> 

Yes, thanks for the hint. Let me do it in next version.



> Thanks,
> Kuai
>> +     prefer_linear_dev_search == 1)
>> + prefer_linear_dev_search = 0;
>> +
>>   return 0;
>>  }
>>  @@ -337,6 +379,7 @@ static struct md_personality linear_personality = {
>>    static int __init linear_init(void)
>>  {
>> + prefer_linear_dev_search = 1;
>>   return register_md_personality(&linear_personality);
>>  }







[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux