Thanks for your kindly suggestions, I will reformat this patch later! At 2019-03-25 21:53:00, "Dongli Zhang" <dongli.zhang@xxxxxxxxxx> wrote: > > >On 03/25/2019 05:53 PM, luferry wrote: >> >> sorry, messed up some func name >> >> When enable virtio-blk with multi queues but with only 2 msix-vector. >> vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues >> >> Since virtual machine users cannot change the vector numbers, I think blk_mq_map_queues should be more friendly with different cpu topology. > > >Yes, we will fallback to blk_mq_map_queues if there is lack of vectors. > >Here is the qemu cmdline: >"-smp 8,sockets=1,cores=4,threads=2" >"-device virtio-blk-pci,drive=drive0,id=device0,num-queues=4,vectors=2" > ># cat /sys/block/vda/mq/0/cpu_list >0, 4, 5 >root@vm:/home/zhang# cat /sys/block/vda/mq/1/cpu_list >1 >root@vm:/home/zhang# cat /sys/block/vda/mq/2/cpu_list >2, 6, 7 >root@vm:/home/zhang# cat /sys/block/vda/mq/3/cpu_list >3 > >How about to put how to hit the issue in commit message so people would be able >to know the scenario? > >Dongli Zhang > >> >> >> >> >> At 2019-03-25 17:49:33, "luferry" <luferry@xxxxxxx> wrote: >>> >>> After reading the code and compare pci device info. >>> I can reproduce this imbalance under KVM. >>> When enable virtio-blk with multi queues but with only 2 msix-vector. >>> vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to virtio_mq_map_queues >>> >>> Since virtual machine users cannot change the vector numbers, I think virtio_mq_map_queues should be more friendly with different cpu topology. >>> >>> 448 const struct cpumask *vp_get_vq_affinity(struct virtio_device *vdev, int index) >>> 449 { >>> 450 struct virtio_pci_device *vp_dev = to_vp_device(vdev); >>> 451 >>> 452 if (!vp_dev->per_vq_vectors || >>> 453 vp_dev->vqs[index]->msix_vector == VIRTIO_MSI_NO_VECTOR) >>> 454 return NULL; >>> 455 >>> 456 return pci_irq_get_affinity(vp_dev->pci_dev, >>> 457 vp_dev->vqs[index]->msix_vector); >>> 458 } >>> >>> 32 int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap, >>> 33 struct virtio_device *vdev, int first_vec) >>> 34 { >>> 35 const struct cpumask *mask; >>> 36 unsigned int queue, cpu; >>> 37 >>> 38 if (!vdev->config->get_vq_affinity) >>> 39 goto fallback; >>> 40 >>> 41 for (queue = 0; queue < qmap->nr_queues; queue++) { >>> 42 mask = vdev->config->get_vq_affinity(vdev, first_vec + queue); //vp_get_vq_affinity >>> 43 if (!mask) >>> 44 goto fallback; >>> 45 >>> 46 for_each_cpu(cpu, mask) >>> 47 qmap->mq_map[cpu] = qmap->queue_offset + queue; >>> 48 } >>> 49 >>> 50 return 0; >>> 51 fallback: >>> 52 return blk_mq_map_queues(qmap); >>> 53 } >>> >>> >>> >>> >>> >>> >>> >>> >>> At 2019-03-23 19:14:34, "Dongli Zhang" <dongli.zhang@xxxxxxxxxx> wrote: >>>> >>>> >>>> On 03/23/2019 02:34 PM, luferry wrote: >>>>> >>>>> I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop. >>>>> I'm sure about something. >>>>> The hypervisor is KVM and disk driver is virtio-blk. >>>>> [root@blk-mq ~]# dmesg |grep KVM >>>>> [ 0.000000] Hypervisor detected: KVM >>>>> [ 0.186330] Booting paravirtualized kernel on KVM >>>>> [ 0.279106] KVM setup async PF for cpu 0 >>>>> [ 0.420819] KVM setup async PF for cpu 1 >>>>> [ 0.421682] KVM setup async PF for cpu 2 >>>>> [ 0.422113] KVM setup async PF for cpu 3 >>>>> [ 0.422434] KVM setup async PF for cpu 4 >>>>> [ 0.422434] KVM setup async PF for cpu 5 >>>>> [ 0.423312] KVM setup async PF for cpu 6 >>>>> [ 0.423394] KVM setup async PF for cpu 7 >>>>> [ 0.424125] KVM setup async PF for cpu 8 >>>>> [ 0.424414] KVM setup async PF for cpu 9 >>>>> [ 0.424415] KVM setup async PF for cpu 10 >>>>> [ 0.425329] KVM setup async PF for cpu 11 >>>>> [ 0.425420] KVM setup async PF for cpu 12 >>>>> [ 0.426156] KVM setup async PF for cpu 13 >>>>> [ 0.426431] KVM setup async PF for cpu 14 >>>>> [ 0.426431] KVM setup async PF for cpu 15 >>>>> [root@blk-mq ~]# lspci |grep block >>>>> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device >>>>> 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device >>>>> >>>>> [root@blk-mq ~]# lscpu >>>>> Architecture: x86_64 >>>>> CPU op-mode(s): 32-bit, 64-bit >>>>> Byte Order: Little Endian >>>>> CPU(s): 16 >>>>> On-line CPU(s) list: 0-15 >>>>> Thread(s) per core: 2 >>>>> Core(s) per socket: 8 >>>>> >>>>> [root@blk-mq ~]# ls /sys/block/vdb/mq/ >>>>> 0 1 2 3 >>>>> >>>>> [root@blk-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id' >>>>> processor : 0 >>>>> core id : 0 >>>>> processor : 1 >>>>> core id : 0 >>>>> processor : 2 >>>>> core id : 1 >>>>> processor : 3 >>>>> core id : 1 >>>>> processor : 4 >>>>> core id : 2 >>>>> processor : 5 >>>>> core id : 2 >>>>> processor : 6 >>>>> core id : 3 >>>>> processor : 7 >>>>> core id : 3 >>>>> processor : 8 >>>>> core id : 4 >>>>> processor : 9 >>>>> core id : 4 >>>>> processor : 10 >>>>> core id : 5 >>>>> processor : 11 >>>>> core id : 5 >>>>> processor : 12 >>>>> core id : 6 >>>>> processor : 13 >>>>> core id : 6 >>>>> processor : 14 >>>>> core id : 7 >>>>> processor : 15 >>>>> core id : 7 >>>>> >>>>> --before this patch-- >>>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list >>>>> 0, 4, 5, 8, 9, 12, 13 >>>>> 1 >>>>> 2, 6, 7, 10, 11, 14, 15 >>>>> 3 >>>>> >>>>> --after this patch-- >>>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list >>>>> 0, 4, 5, 12, 13 >>>>> 1, 6, 7, 14, 15 >>>>> 2, 8, 9 >>>>> 3, 10, 11 >>>>> >>>>> >>>>> I add dump_stack insert blk_mq_map_queues. >>>>> >>>>> [ 1.378680] Call Trace: >>>>> [ 1.378690] dump_stack+0x5a/0x73 >>>>> [ 1.378695] blk_mq_map_queues+0x29/0xb0 >>>>> [ 1.378700] blk_mq_alloc_tag_set+0x1bd/0x2d0 >>>>> [ 1.378705] virtblk_probe+0x1ae/0x8e4 [virtio_blk] >>>>> [ 1.378709] virtio_dev_probe+0x18a/0x230 [virtio] >>>>> [ 1.378713] really_probe+0x215/0x3f0 >>>>> [ 1.378716] driver_probe_device+0x115/0x130 >>>>> [ 1.378718] device_driver_attach+0x50/0x60 >>>>> [ 1.378720] __driver_attach+0xbd/0x140 >>>>> [ 1.378722] ? device_driver_attach+0x60/0x60 >>>>> [ 1.378724] bus_for_each_dev+0x67/0xc0 >>>>> [ 1.378727] ? klist_add_tail+0x57/0x70 >>>> >>>> I am not able to reproduce above call stack when virtio-blk is assigned 4 queues >>>> while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2". >>>> >>>> # cat /sys/block/vda/mq/0/cpu_list >>>> 0, 1, 2, 3 >>>> # cat /sys/block/vda/mq/1/cpu_list >>>> 4, 5, 6, 7 >>>> # cat /sys/block/vda/mq/2/cpu_list >>>> 8, 9, 10, 11 >>>> # cat /sys/block/vda/mq/3/cpu_list >>>> 12, 13, 14, 15 >>>> >>>> >>>> I do agree in above case we would have issue if the mapping is established by >>>> blk_mq_map_queues(). >>>> >>>> >>>> However, I am just curious how we finally reach at blk_mq_map_queues() from >>>> blk_mq_alloc_tag_set()? >>>> >>>> It should be something like: >>>> >>>> blk_mq_alloc_tag_set() >>>> -> blk_mq_update_queue_map() >>>> -> if (set->ops->map_queues && !is_kdump_kernel()) >>>> return set->ops->map_queues(set); >>>> -> else >>>> return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]); >>>> >>>> Wouldn't we always have set->ops->map_queues = virtblk_map_queues()? >>>> >>>> Or the execution reach at: >>>> >>>> virtblk_map_queues() >>>> -> blk_mq_virtio_map_queues() >>>> -> if (!vdev->config->get_vq_affinity) >>>> return blk_mq_map_queues(qmap); >>>> -> else >>>> establish the mapping via get_vq_affinity >>>> >>>> but vdev->config->get_vq_affinity == NULL? >>>> >>>> For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set >>>> get_vq_affinity. >>>> >>>> >>>> I used to play with firecracker (by amazon) and it is interesting firecracker >>>> uses mmio to setup virtio-blk. >>>> >>>> >>>> Sorry for disturbing the review of this patch. I just would like to clarify in >>>> which scenario we would hit this issue, e.g., when virtio-blk is based on mmio? >>>> >>>> Dongli Zhang >>>> >>>>> >>>>> >>>>> At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@xxxxxxxxxx> wrote: >>>>>> >>>>>> >>>>>> On 03/22/2019 06:09 PM, luferry wrote: >>>>>>> under virtual machine environment, cpu topology may differ from normal >>>>>>> physical server. >>>>>> >>>>>> Would mind share the name of virtual machine monitor, the command line if >>>>>> available, and which device to reproduce. >>>>>> >>>>>> For instance, I am not able to reproduce with qemu nvme or virtio-blk as I >>>>>> assume they use pci or virtio specific mapper to establish the mapping. >>>>>> >>>>>> E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2 >>>>>> >>>>>> Indeed I use three queues instead of twp as one is reserved for admin. >>>>>> >>>>>> # ls /sys/block/nvme0n1/mq/* >>>>>> /sys/block/nvme0n1/mq/0: >>>>>> cpu0 cpu1 cpu2 cpu3 cpu_list nr_reserved_tags nr_tags >>>>>> >>>>>> /sys/block/nvme0n1/mq/1: >>>>>> cpu4 cpu5 cpu6 cpu7 cpu_list nr_reserved_tags nr_tags >>>>>> >>>>>> >>>>>> Thank you very much! >>>>>> >>>>>> Dongli Zhang >>>>>> >>>>>>> for example (machine with 4 cores, 2 threads per core): >>>>>>> >>>>>>> normal physical server: >>>>>>> core-id thread-0-id thread-1-id >>>>>>> 0 0 4 >>>>>>> 1 1 5 >>>>>>> 2 2 6 >>>>>>> 3 3 7 >>>>>>> >>>>>>> virtual machine: >>>>>>> core-id thread-0-id thread-1-id >>>>>>> 0 0 1 >>>>>>> 1 2 3 >>>>>>> 2 4 5 >>>>>>> 3 6 7 >>>>>>> >>>>>>> When attach disk with two queues, all the even numbered cpus will be >>>>>>> mapped to queue 0. Under virtual machine, all the cpus is followed by >>>>>>> its sibling cpu.Before this patch, all the odd numbered cpus will also >>>>>>> be mapped to queue 0, can cause serious imbalance.this will lead to >>>>>>> performance impact on system IO >>>>>>> >>>>>>> So suggest to allocate cpu map by core id, this can be more currency >>>>>>> >>>>>>> Signed-off-by: luferry <luferry@xxxxxxx> >>>>>>> --- >>>>>>> block/blk-mq-cpumap.c | 9 +++++---- >>>>>>> 1 file changed, 5 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c >>>>>>> index 03a534820271..4125e8e77679 100644 >>>>>>> --- a/block/blk-mq-cpumap.c >>>>>>> +++ b/block/blk-mq-cpumap.c >>>>>>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap) >>>>>>> { >>>>>>> unsigned int *map = qmap->mq_map; >>>>>>> unsigned int nr_queues = qmap->nr_queues; >>>>>>> - unsigned int cpu, first_sibling; >>>>>>> + unsigned int cpu, first_sibling, core = 0; >>>>>>> >>>>>>> for_each_possible_cpu(cpu) { >>>>>>> /* >>>>>>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap) >>>>>>> map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu); >>>>>>> } else { >>>>>>> first_sibling = get_first_sibling(cpu); >>>>>>> - if (first_sibling == cpu) >>>>>>> - map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu); >>>>>>> - else >>>>>>> + if (first_sibling == cpu) { >>>>>>> + map[cpu] = cpu_to_queue_index(qmap, nr_queues, core); >>>>>>> + core++; >>>>>>> + } else >>>>>>> map[cpu] = map[first_sibling]; >>>>>>> } >>>>>>> } >>>>>>>