On 2012-07-14 07:56 Dan Williams <dan.j.williams@xxxxxxxxx> Wrote: >[ adding Shaohua ] > >On Fri, Jul 13, 2012 at 3:31 AM, majianpeng <majianpeng@xxxxxxxxx> wrote: >> To improve write perfomance by decreasing the preread stripe,only move >> IO_THRESHOLD stripes from delay_list to hold_list once. >> >> Using the follow command: >> dd if=/dev/zero of=/dev/md0 bs=2M count=52100. >> >> At default condition: speed is 95MB/s. >> At the condition of preread_bypass_threshold was equal zero:speed is 105MB/s. >> Using this patch:speed is 123MB/s. >> >> If preread_bypass_threshold was zero,the performance will be better,but >> not better than this patch. >> I think maybe two reason: >> 1:If bio is REQ_SYNC >> 2:In function __get_priority_stripe(): >>>> } else if (!list_empty(&conf->hold_list) && >>>> ((conf->bypass_threshold && >>>> conf->bypass_count > conf->bypass_threshold) || >>>> atomic_read(&conf->pending_full_writes) == 0)) { >> Preread_bypass_threshold is one condition of getting stripe from >> hold_list.So only control the number of hold_list can get better >> performance. > >So this is a pretty obvious tradeoff of increased latency for improved >throughput. Any idea how much this change affects latency? >Especially in the fast device case? I did not think the latency.If it only fetch preread_bypass_threshold stripes from delay_list to host_list,the latency can be control by userspace. The code like : static void raid5_activate_delayed(struct r5conf *conf) { + int count = 0; if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD) { while (!list_empty(&conf->delayed_list)) { struct list_head *l = conf->delayed_list.next; @@ -3672,6 +3673,8 @@ static void raid5_activate_delayed(struct r5conf *conf) if (!test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) atomic_inc(&conf->preread_active_stripes); list_add_tail(&sh->lru, &conf->hold_list); + if (++count >= conf->preread_active_stripes) + break; } ?韬{.n?????%??檩??w?{.n???{炳盯w???塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f