Re:Re:Re: [PATCH] block/mq: blk map queues by core id

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



sorry,  messed up some func name

When enable virtio-blk with multi queues but with only 2 msix-vector.
vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues

Since virtual machine users cannot change the vector numbers, I think blk_mq_map_queues should be more friendly with different cpu topology.




At 2019-03-25 17:49:33, "luferry" <luferry@xxxxxxx> wrote:
>
>After reading the code and compare pci device info.
>I can reproduce this imbalance under KVM.
>When enable virtio-blk with multi queues but with only 2 msix-vector.
>vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to virtio_mq_map_queues
>
>Since virtual machine users cannot change the vector numbers, I think virtio_mq_map_queues should be more friendly with different cpu topology. 
>
>448 const struct cpumask *vp_get_vq_affinity(struct virtio_device *vdev, int index)
>449 {
>450     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>451
>452     if (!vp_dev->per_vq_vectors ||
>453         vp_dev->vqs[index]->msix_vector == VIRTIO_MSI_NO_VECTOR)
>454         return NULL;
>455
>456     return pci_irq_get_affinity(vp_dev->pci_dev,
>457                     vp_dev->vqs[index]->msix_vector);
>458 }
>
> 32 int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
> 33         struct virtio_device *vdev, int first_vec)
> 34 {
> 35     const struct cpumask *mask;
> 36     unsigned int queue, cpu;
> 37
> 38     if (!vdev->config->get_vq_affinity)
> 39         goto fallback;
> 40
> 41     for (queue = 0; queue < qmap->nr_queues; queue++) {
> 42         mask = vdev->config->get_vq_affinity(vdev, first_vec + queue); //vp_get_vq_affinity
> 43         if (!mask)
> 44             goto fallback;
> 45
> 46         for_each_cpu(cpu, mask)
> 47             qmap->mq_map[cpu] = qmap->queue_offset + queue;
> 48     }
> 49
> 50     return 0;
> 51 fallback:
> 52     return blk_mq_map_queues(qmap);
> 53 }
>
>
>
>
>
>
>
>
>At 2019-03-23 19:14:34, "Dongli Zhang" <dongli.zhang@xxxxxxxxxx> wrote:
>>
>>
>>On 03/23/2019 02:34 PM, luferry wrote:
>>> 
>>> I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop.
>>> I'm sure about something.
>>> The hypervisor is KVM and disk driver is virtio-blk.
>>> [root@blk-mq ~]# dmesg |grep KVM
>>> [    0.000000] Hypervisor detected: KVM
>>> [    0.186330] Booting paravirtualized kernel on KVM
>>> [    0.279106] KVM setup async PF for cpu 0
>>> [    0.420819] KVM setup async PF for cpu 1
>>> [    0.421682] KVM setup async PF for cpu 2
>>> [    0.422113] KVM setup async PF for cpu 3
>>> [    0.422434] KVM setup async PF for cpu 4
>>> [    0.422434] KVM setup async PF for cpu 5
>>> [    0.423312] KVM setup async PF for cpu 6
>>> [    0.423394] KVM setup async PF for cpu 7
>>> [    0.424125] KVM setup async PF for cpu 8
>>> [    0.424414] KVM setup async PF for cpu 9
>>> [    0.424415] KVM setup async PF for cpu 10
>>> [    0.425329] KVM setup async PF for cpu 11
>>> [    0.425420] KVM setup async PF for cpu 12
>>> [    0.426156] KVM setup async PF for cpu 13
>>> [    0.426431] KVM setup async PF for cpu 14
>>> [    0.426431] KVM setup async PF for cpu 15
>>> [root@blk-mq ~]# lspci |grep block
>>> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>> 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>> 
>>> [root@blk-mq ~]# lscpu
>>> Architecture:          x86_64
>>> CPU op-mode(s):        32-bit, 64-bit
>>> Byte Order:            Little Endian
>>> CPU(s):                16
>>> On-line CPU(s) list:   0-15
>>> Thread(s) per core:    2
>>> Core(s) per socket:    8
>>> 
>>> [root@blk-mq ~]# ls /sys/block/vdb/mq/
>>> 0  1  2  3
>>> 
>>> [root@blk-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id'
>>> processor	: 0
>>> core id		: 0
>>> processor	: 1
>>> core id		: 0
>>> processor	: 2
>>> core id		: 1
>>> processor	: 3
>>> core id		: 1
>>> processor	: 4
>>> core id		: 2
>>> processor	: 5
>>> core id		: 2
>>> processor	: 6
>>> core id		: 3
>>> processor	: 7
>>> core id		: 3
>>> processor	: 8
>>> core id		: 4
>>> processor	: 9
>>> core id		: 4
>>> processor	: 10
>>> core id		: 5
>>> processor	: 11
>>> core id		: 5
>>> processor	: 12
>>> core id		: 6
>>> processor	: 13
>>> core id		: 6
>>> processor	: 14
>>> core id		: 7
>>> processor	: 15
>>> core id		: 7
>>> 
>>> --before this patch--
>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>> 0, 4, 5, 8, 9, 12, 13
>>> 1
>>> 2, 6, 7, 10, 11, 14, 15
>>> 3
>>> 
>>> --after this patch--
>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>> 0, 4, 5, 12, 13
>>> 1, 6, 7, 14, 15
>>> 2, 8, 9
>>> 3, 10, 11
>>> 
>>> 
>>> I add dump_stack insert blk_mq_map_queues.
>>> 
>>> [    1.378680] Call Trace:
>>> [    1.378690]  dump_stack+0x5a/0x73
>>> [    1.378695]  blk_mq_map_queues+0x29/0xb0
>>> [    1.378700]  blk_mq_alloc_tag_set+0x1bd/0x2d0
>>> [    1.378705]  virtblk_probe+0x1ae/0x8e4 [virtio_blk]
>>> [    1.378709]  virtio_dev_probe+0x18a/0x230 [virtio]
>>> [    1.378713]  really_probe+0x215/0x3f0
>>> [    1.378716]  driver_probe_device+0x115/0x130
>>> [    1.378718]  device_driver_attach+0x50/0x60
>>> [    1.378720]  __driver_attach+0xbd/0x140
>>> [    1.378722]  ? device_driver_attach+0x60/0x60
>>> [    1.378724]  bus_for_each_dev+0x67/0xc0
>>> [    1.378727]  ? klist_add_tail+0x57/0x70
>>
>>I am not able to reproduce above call stack when virtio-blk is assigned 4 queues
>>while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2".
>>
>># cat /sys/block/vda/mq/0/cpu_list
>>0, 1, 2, 3
>># cat /sys/block/vda/mq/1/cpu_list
>>4, 5, 6, 7
>># cat /sys/block/vda/mq/2/cpu_list
>>8, 9, 10, 11
>># cat /sys/block/vda/mq/3/cpu_list
>>12, 13, 14, 15
>>
>>
>>I do agree in above case we would have issue if the mapping is established by
>>blk_mq_map_queues().
>>
>>
>>However, I am just curious how we finally reach at blk_mq_map_queues() from
>>blk_mq_alloc_tag_set()?
>>
>>It should be something like:
>>
>>blk_mq_alloc_tag_set()
>> -> blk_mq_update_queue_map()
>>      -> if (set->ops->map_queues && !is_kdump_kernel())
>>             return set->ops->map_queues(set);
>>      -> else
>>             return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
>>
>>Wouldn't we always have set->ops->map_queues = virtblk_map_queues()?
>>
>>Or the execution reach at:
>>
>>virtblk_map_queues()
>> -> blk_mq_virtio_map_queues()
>>     -> if (!vdev->config->get_vq_affinity)
>>            return blk_mq_map_queues(qmap);
>>     -> else
>>            establish the mapping via get_vq_affinity
>>
>>but vdev->config->get_vq_affinity == NULL?
>>
>>For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set
>>get_vq_affinity.
>>
>>
>>I used to play with firecracker (by amazon) and it is interesting firecracker
>>uses mmio to setup virtio-blk.
>>
>>
>>Sorry for disturbing the review of this patch. I just would like to clarify in
>>which scenario we would hit this issue, e.g., when virtio-blk is based on mmio?
>>
>>Dongli Zhang
>>
>>> 
>>> 
>>> At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@xxxxxxxxxx> wrote:
>>>>
>>>>
>>>> On 03/22/2019 06:09 PM, luferry wrote:
>>>>> under virtual machine environment, cpu topology may differ from normal
>>>>> physical server.
>>>>
>>>> Would mind share the name of virtual machine monitor, the command line if
>>>> available, and which device to reproduce.
>>>>
>>>> For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
>>>> assume they use pci or virtio specific mapper to establish the mapping.
>>>>
>>>> E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2
>>>>
>>>> Indeed I use three queues instead of twp as one is reserved for admin.
>>>>
>>>> # ls /sys/block/nvme0n1/mq/*
>>>> /sys/block/nvme0n1/mq/0:
>>>> cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags
>>>>
>>>> /sys/block/nvme0n1/mq/1:
>>>> cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags
>>>>
>>>>
>>>> Thank you very much!
>>>>
>>>> Dongli Zhang
>>>>
>>>>> for example (machine with 4 cores, 2 threads per core):
>>>>>
>>>>> normal physical server:
>>>>> core-id   thread-0-id  thread-1-id
>>>>> 0         0            4
>>>>> 1         1            5
>>>>> 2         2            6
>>>>> 3         3            7
>>>>>
>>>>> virtual machine:
>>>>> core-id   thread-0-id  thread-1-id
>>>>> 0         0            1
>>>>> 1         2            3
>>>>> 2         4            5
>>>>> 3         6            7
>>>>>
>>>>> When attach disk with two queues, all the even numbered cpus will be
>>>>> mapped to queue 0. Under virtual machine, all the cpus is followed by
>>>>> its sibling cpu.Before this patch, all the odd numbered cpus will also
>>>>> be mapped to queue 0, can cause serious imbalance.this will lead to
>>>>> performance impact on system IO
>>>>>
>>>>> So suggest to allocate cpu map by core id, this can be more currency
>>>>>
>>>>> Signed-off-by: luferry <luferry@xxxxxxx>
>>>>> ---
>>>>>  block/blk-mq-cpumap.c | 9 +++++----
>>>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>>>> index 03a534820271..4125e8e77679 100644
>>>>> --- a/block/blk-mq-cpumap.c
>>>>> +++ b/block/blk-mq-cpumap.c
>>>>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>  {
>>>>>  	unsigned int *map = qmap->mq_map;
>>>>>  	unsigned int nr_queues = qmap->nr_queues;
>>>>> -	unsigned int cpu, first_sibling;
>>>>> +	unsigned int cpu, first_sibling, core = 0;
>>>>>  
>>>>>  	for_each_possible_cpu(cpu) {
>>>>>  		/*
>>>>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>>  		} else {
>>>>>  			first_sibling = get_first_sibling(cpu);
>>>>> -			if (first_sibling == cpu)
>>>>> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>> -			else
>>>>> +			if (first_sibling == cpu) {
>>>>> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
>>>>> +				core++;
>>>>> +			} else
>>>>>  				map[cpu] = map[first_sibling];
>>>>>  		}
>>>>>  	}
>>>>>




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux