Re: virtio-blk: should num_vqs be limited by num_possible_cpus()?

Dongli Zhang <dongli.zhang@xxxxxxxxxx> · Thu, 21 Mar 2019 10:14:38 +0800

On 3/20/19 8:53 PM, Jason Wang wrote:
> 
> On 2019/3/19 上午10:22, Dongli Zhang wrote:
>> Hi Jason,
>>
>> On 3/18/19 3:47 PM, Jason Wang wrote:
>>> On 2019/3/15 下午8:41, Cornelia Huck wrote:
>>>> On Fri, 15 Mar 2019 12:50:11 +0800
>>>> Jason Wang <jasowang@xxxxxxxxxx> wrote:
>>>>
>>>>> Or something like I proposed several years ago?
>>>>> https://do-db2.lkml.org/lkml/2014/12/25/169
>>>>>
>>>>> Btw, for virtio-net, I think we actually want to go for having a maximum
>>>>> number of supported queues like what hardware did. This would be useful
>>>>> for e.g cpu hotplug or XDP (requires per cpu TX queue). But the current
>>>>> vector allocation doesn't support this which will results all virtqueues
>>>>> to share a single vector. We may indeed need more flexible policy here.
>>>> I think it should be possible for the driver to give the transport
>>>> hints how to set up their queues/interrupt structures. (The driver
>>>> probably knows best about its requirements.) Perhaps whether a queue is
>>>> high or low frequency, or whether it should be low latency, or even
>>>> whether two queues could share a notification mechanism without
>>>> drawbacks. It's up to the transport to make use of that information, if
>>>> possible.
>>>
>>> Exactly and it was what the above series tried to do by providing hints of e.g
>>> which queues want to share a notification.
>>>
>> I read about your patch set on providing more flexibility of queue-to-vector
>> mapping.
>>
>> One use case of the patch set is we would be able to enable more queues when
>> there is limited number of vectors.
>>
>> Another use case we may classify queues as hight priority or low priority as
>> mentioned by Cornelia.
>>
>> For virtio-blk, we may extend virtio-blk based on this patch set to enable
>> something similar to write_queues/poll_queues in nvme, when (set->nr_maps != 1).
>>
>>
>> Yet, the question I am asking in this email thread is for a difference scenario.
>>
>> The issue is not we are not having enough vectors (although this is why only 1
>> vector is allocated for all virtio-blk queues). As so far virtio-blk has
>> (set->nr_maps == 1), block layer would limit the number of hw queues by
>> nr_cpu_ids, we indeed do not need more than nr_cpu_ids hw queues in virtio-blk.
>>
>> That's why I ask why not change the flow as below options when the number of
>> supported hw queues is more than nr_cpu_ids (and set->nr_maps == 1. virtio-blk
>> does not set nr_maps and block layer would set it to 1 when the driver does not
>> specify with a value):
>>
>> option 1:
>> As what nvme and xen-netfront do, limit the hw queue number by nr_cpu_ids.
> 
> 
> How do they limit the hw queue number? A command?

The max #queue is also limited by other factors, e.g., kernel param
configuration, xen dom0 configuration or nvme hardware support.

Here we would ignore those factors for simplicity and only talk about the
relation between #queue and #cpu.

About nvme pci:

Regardless about new write_queues and poll_queues, the default queue type number
is limited by num_possible_cpus() as line 2120 and 252.

2113 static int nvme_setup_io_queues(struct nvme_dev *dev)
2114 {
2115         struct nvme_queue *adminq = &dev->queues[0];
2116         struct pci_dev *pdev = to_pci_dev(dev->dev);
2117         int result, nr_io_queues;
2118         unsigned long size;
2119
2120         nr_io_queues = max_io_queues();
2121         result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);

 250 static unsigned int max_io_queues(void)
 251 {
 252         return num_possible_cpus() + write_queues + poll_queues;
 253 }

The cons of this is there might be many unused hw queues and vectors when
num_possible_cpus() is very very large while only a small number of cpu are
online. I am looking if there is way to improve this.

About xen-blkfront:

Indeed the max #queue is limited by num_online_cpus() when xen-blkfront module
is loaded as line 2733 and 2736.

2707 static int __init xlblk_init(void)
... ...
2710         int nr_cpus = num_online_cpus();
... ...
2733         if (xen_blkif_max_queues > nr_cpus) {
2734                 pr_info("Invalid max_queues (%d), will use default max: %d.\n",
2735                         xen_blkif_max_queues, nr_cpus);
2736                 xen_blkif_max_queues = nr_cpus;
2737         }

The cons of this is the number of hw-queue/hctx is limited and cannot increase
after cpu hotplug. I am looking if there is way to improve this.

While both have cons for cpu hotplug, they are trying to make #vector
proportional to the number of cpu.

For xen-blkfront and virtio-blk, as (set=nr_maps == 1), the number of hw queue
is limited by nr_cpu_ids again at block layer.

As virtio-blk is a PCI device, can we use the solution in nvme, that is, to use
num_possible_cpus to limited the max queues in virtio-blk?

Thank you very much!

Dongli Zhang

> 
> 
>>
>> option 2:
>> If the vectors is not enough, use the max number vector (indeed nr_cpu_ids) as
>> number of hw queues.
> 
> 
> We can share vectors in this case.
> 
> 
>>
>> option 3:
>> We should allow more vectors even the block layer would support at most
>> nr_cpu_ids queues.
>>
>>
>> I understand a new policy for queue-vector mapping is very helpful. I am just
>> asking the question from block layer's point of view.
>>
>> Thank you very much!
>>
>> Dongli Zhang
> 
> 
> Don't know much for block, cc Stefan for more idea.
> 
> Thanks
>