Re: [Query] increased latency observed in cpu hotplug path

"Khan, Imran" <kimran@xxxxxxxxxxxxxx> · Wed, 17 Aug 2016 22:17:39 +0530



On 8/16/2016 11:23 AM, Khan, Imran wrote:
> On 8/5/2016 12:49 PM, Khan, Imran wrote:
>> On 8/1/2016 2:58 PM, Khan, Imran wrote:
>>> On 7/30/2016 7:54 AM, Akinobu Mita wrote:
>>>> 2016-07-28 22:18 GMT+09:00 Khan, Imran <kimran@xxxxxxxxxxxxxx>:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Recently we have observed some increased latency in CPU hotplug
>>>>> event in CPU online path. For online latency we see that block
>>>>> layer is executing notification handler for CPU_UP_PREPARE event
>>>>> and this in turn waits for RCU grace period resulting (sometimes)
>>>>> in an execution time of 15-20 ms for this notification handler.
>>>>> This change was not there in 3.18 kernel but is present in 4.4
>>>>> kernel and was introduced by following commit:
>>>>>
>>>>>
>>>>> commit 5778322e67ed34dc9f391a4a5cbcbb856071ceba
>>>>> Author: Akinobu Mita <akinobu.mita@xxxxxxxxx>
>>>>> Date:   Sun Sep 27 02:09:23 2015 +0900
>>>>>
>>>>>     blk-mq: avoid inserting requests before establishing new mapping
>>>>
>>>> ...
>>>>
>>>>> Upon reverting this commit I could see an improvement of 15-20 ms in
>>>>> online latency. So I am looking for some help in analyzing the effects
>>>>> of reverting this or should some other approach to reduce the online
>>>>> latency must be taken.
>>>>
>>>> Can you observe the difference in online latency by removing
>>>> get_online_cpus() and put_online_cpus() pair in blk_mq_init_allocated_queue()
>>>> instead of full reverting the commit?
>>>>
>>> Hi Akinobu,
>>> I tried your suggestion but could not achieve any improvement. Actually the snippet that is causing the change in latency is the following one :
>>>
>>> list_for_each_entry(q, &all_q_list, all_q_node) {
>>>                 blk_mq_freeze_queue_wait(q);
>>>
>>>                 /*
>>>                  * timeout handler can't touch hw queue during the
>>>                  * reinitialization
>>>                  */
>>>                 del_timer_sync(&q->timeout);
>>>  }
>>>
>>> I understand that this is getting executed now for CPU_UP_PREPARE as well resulting in 
>>> increased latency in the cpu online path. I am trying to reduce this latency while keeping the 
>>> purpose of this commit intact. I would welcome further suggestions/feedback in this regard.
>>>
>> Hi Akinobu,
>>
>> I am not able to reduce the cpu online latency with this patch, could you please let me know what
>> functionality will be broken, if we avoid this patch in our kernel. Also if you have some other 
>> suggestions towards improving this patch please let me know.
>>
> After moving the remapping of queues to block layer's kworker I see that online latency has improved
> while offline latency remains the same. As the freezing of queues happens in the context of block layer's
> worker, I think it would be better to do the remapping in the same context and then go ahead with freezing.
> In this regard I have made following change:
> 
> commit b2131b86eeef4c5b1f8adaf7a53606301aa6b624
> Author: Imran Khan <kimran@xxxxxxxxxxxxxx>
> Date:   Fri Aug 12 19:59:47 2016 +0530
> 
>     blk-mq: Move block queue remapping from cpu hotplug path
> 
>     During a cpu hotplug, the hardware and software contexts mappings
>     need to be updated in order to take into account requests
>     submitted for the hotadded CPU. But if this mapping is done
>     in hotplug notifier, it deteriorates the hotplug latency.
>     So move the block queue remapping to block layer worker which
>     results in significant improvements in hotplug latency.
> 
>     Change-Id: I01ac83178ce95c3a4e3b7b1b286eda65ff34e8c4
>     Signed-off-by: Imran Khan <kimran@xxxxxxxxxxxxxx>
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 6d6f8fe..06fcf89 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -22,7 +22,11 @@
>  #include <linux/sched/sysctl.h>
>  #include <linux/delay.h>
>  #include <linux/crash_dump.h>
> -
> +#include <linux/kthread.h>
> +#include <linux/mutex.h>
> +#include <linux/sched.h>
> +#include <linux/kthread.h>
> +#include <linux/mutex.h>
>  #include <trace/events/block.h>
> 
>  #include <linux/blk-mq.h>
> @@ -32,10 +36,18 @@
> 
>  static DEFINE_MUTEX(all_q_mutex);
>  static LIST_HEAD(all_q_list);
> -
>  static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx);
> 
>  /*
> + * New online cpumask which is going to be set in this hotplug event.
> + * Declare this cpumasks as global as cpu-hotplug operation is invoked
> + * one-by-one and dynamically allocating this could result in a failure.
> + */
> +static struct cpumask online_new;
> +
> +static struct work_struct blk_mq_remap_work;
> +
> +/*
>   * Check if any of the ctx's have pending work in this hardware queue
>   */
>  static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
> @@ -2125,14 +2137,7 @@ static void blk_mq_queue_reinit(struct request_queue *q,
>  static int blk_mq_queue_reinit_notify(struct notifier_block *nb,
>                                       unsigned long action, void *hcpu)
>  {
> -       struct request_queue *q;
>         int cpu = (unsigned long)hcpu;
> -       /*
> -        * New online cpumask which is going to be set in this hotplug event.
> -        * Declare this cpumasks as global as cpu-hotplug operation is invoked
> -        * one-by-one and dynamically allocating this could result in a failure.
> -        */
> -       static struct cpumask online_new;
> 
>         /*
>          * Before hotadded cpu starts handling requests, new mappings must
> @@ -2155,43 +2160,17 @@ static int blk_mq_queue_reinit_notify(struct notifier_block *nb,
>         case CPU_DEAD:
>         case CPU_UP_CANCELED:
>                 cpumask_copy(&online_new, cpu_online_mask);
> +               kblockd_schedule_work(&blk_mq_remap_work);
>                 break;
>         case CPU_UP_PREPARE:
>                 cpumask_copy(&online_new, cpu_online_mask);
>                 cpumask_set_cpu(cpu, &online_new);
> +               kblockd_schedule_work(&blk_mq_remap_work);
>                 break;
>         default:
>                 return NOTIFY_OK;
>         }
> 
> -       mutex_lock(&all_q_mutex);
> -
> -       /*
> -        * We need to freeze and reinit all existing queues.  Freezing
> -        * involves synchronous wait for an RCU grace period and doing it
> -        * one by one may take a long time.  Start freezing all queues in
> -        * one swoop and then wait for the completions so that freezing can
> -        * take place in parallel.
> -        */
> -       list_for_each_entry(q, &all_q_list, all_q_node)
> -               blk_mq_freeze_queue_start(q);
> -       list_for_each_entry(q, &all_q_list, all_q_node) {
> -               blk_mq_freeze_queue_wait(q);
> -
> -               /*
> -                * timeout handler can't touch hw queue during the
> -                * reinitialization
> -                */
> -               del_timer_sync(&q->timeout);
> -       }
> -
> -       list_for_each_entry(q, &all_q_list, all_q_node)
> -               blk_mq_queue_reinit(q, &online_new);
> -
> -       list_for_each_entry(q, &all_q_list, all_q_node)
> -               blk_mq_unfreeze_queue(q);
> -
> -       mutex_unlock(&all_q_mutex);
>         return NOTIFY_OK;
>  }
> 
> @@ -2357,11 +2336,48 @@ void blk_mq_enable_hotplug(void)
>         mutex_unlock(&all_q_mutex);
>  }
> 
> +static void blk_remap_work(struct work_struct *work)
> +{
> +       struct request_queue *q;
> +
> +       /* Start the task of queue remapping */
> +       mutex_lock(&all_q_mutex);
> +
> +       /*
> +        * We need to freeze and reinit all existing queues.  Freezing
> +        * involves synchronous wait for an RCU grace period and doing it
> +        * one by one may take a long time.  Start freezing all queues in
> +        * one swoop and then wait for the completions so that freezing can
> +        * take place in parallel.
> +        */
> +       list_for_each_entry(q, &all_q_list, all_q_node)
> +               blk_mq_freeze_queue_start(q);
> +       list_for_each_entry(q, &all_q_list, all_q_node) {
> +               blk_mq_freeze_queue_wait(q);
> +
> +               /*
> +                * timeout handler can't touch hw queue during the
> +                * reinitialization
> +                */
> +               del_timer_sync(&q->timeout);
> +       }
> +
> +       list_for_each_entry(q, &all_q_list, all_q_node)
> +               blk_mq_queue_reinit(q, &online_new);
> +
> +       list_for_each_entry(q, &all_q_list, all_q_node)
> +               blk_mq_unfreeze_queue(q);
> +
> +       mutex_unlock(&all_q_mutex);
> +}
> +
> +
>  static int __init blk_mq_init(void)
>  {
>         blk_mq_cpu_init();
> 
>         hotcpu_notifier(blk_mq_queue_reinit_notify, 0);
> +       INIT_WORK(&blk_mq_remap_work, blk_remap_work);
> 
>         return 0;
>  }
> 
> I have run overnight tests of fsstress and multiple dd commands along with cpu hotplugging.
> Could you please review this change and let me know if you see any issues with this change or if I need 
> to test some specific corner cases to validate this change.
> 
Could someone kindly review this change
and provide feedback?

-- 
Imran Khan
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a\nmember of the Code Aurora Forum, hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html