On Wed, Feb 08, 2023 at 09:33:33AM +0100, Peter-Jan Gootzen wrote: > > > On 07/02/2023 22:57, Vivek Goyal wrote: > > On Tue, Feb 07, 2023 at 04:32:02PM -0500, Stefan Hajnoczi wrote: > > > On Tue, Feb 07, 2023 at 02:53:58PM -0500, Vivek Goyal wrote: > > > > On Tue, Feb 07, 2023 at 02:45:39PM -0500, Stefan Hajnoczi wrote: > > > > > On Tue, Feb 07, 2023 at 11:14:46AM +0100, Peter-Jan Gootzen wrote: > > > > > > Hi, > > > > > > > > > > > > > > [cc German] > > > > > > > > > > For my MSc thesis project in collaboration with IBM > > > > > > (https://github.com/IBM/dpu-virtio-fs) we are looking to improve the > > > > > > performance of the virtio-fs driver in high throughput scenarios. We think > > > > > > the main bottleneck is the fact that the virtio-fs driver does not support > > > > > > multi-queue (while the spec does). A big factor in this is that our setup on > > > > > > the virtio-fs device-side (a DPU) does not easily allow multiple cores to > > > > > > tend to a single virtio queue. > > > > > > > > This is an interesting limitation in DPU. > > > > > > Virtqueues are single-consumer queues anyway. Sharing them between > > > multiple threads would be expensive. I think using multiqueue is natural > > > and not specific to DPUs. > > > > Can we create multiple threads (a thread pool) on DPU and let these > > threads process requests in parallel (While there is only one virt > > queue). > > > > So this is what we had done in virtiofsd. One thread is dedicated to > > pull the requests from virt queue and then pass the request to thread > > pool to process it. And that seems to help with performance in > > certain cases. > > > > Is that possible on DPU? That itself can give a nice performance > > boost for certain workloads without having to implement multiqueue > > actually. > > > > Just curious. I am not opposed to the idea of multiqueue. I am > > just curious about the kind of performance gain (if any) it can > > provide. And will this be helpful for rust virtiofsd running on > > host as well? > > > > Thanks > > Vivek > > > There is technically nothing preventing us from consuming a single queue on > multiple cores, however our current Virtio implementation (DPU-side) is set > up with the assumption that you should never want to do that (concurrency > mayham around the Virtqueues and the DMAs). So instead of putting all the > work into reworking the implementation to support that and still incur the > big overhead, we see it more fitting to amend the virtio-fs driver with > multi-queue support. > > > > Is it just a theory at this point of time or have you implemented > > it and seeing significant performance benefit with multiqueue? > > It is a theory, but we are currently seeing that using the single request > queue, the single core attending to that queue on the DPU is reasonably > close to being fully saturated. > > > And will this be helpful for rust virtiofsd running on > > host as well? > > I figure this would be dependent on the workload and the users-needs. > Having many cores concurrently pulling on their own virtq and then > immediately process the request locally would of course improve performance. > But we are offloading all this work to the DPU, for providing > high-throughput cloud services. I think Vivek is getting at whether your code processes requests sequentially or in parallel. A single thread processing the virtqueue that hands off requests to worker threads or uses io_uring to perform I/O asynchronously will perform differently from a single thread that processes requests sequentially in a blocking fashion. Multiqueue is not necessary for parallelism, but the single queue might become a bottleneck. > > Sounds good. Assigning vqs round-robin is the strategy that virtio-net > > and virtio-blk use. virtio-blk could be an interesting example as it's > > similar to virtiofs. The Linux multiqueue block layer and core virtio > > irq allocation handle CPU affinity in the case of virtio-blk. > > The virtio-blk use the queue assigned by the mq block layer and virtio-net > the queue assigned from the net core layer correct? Yes. > If I interpret you correct, the round-robin strategy is done by assigning > cores to queues round-robin, not per requests dynamically round-robin? Yes, virtqueues are assigned to CPUs statically. > This is what I remembered as well, but can't find it clearly in the source > right now, do you have references to the source for this? virtio_blk.ko uses an irq_affinity descriptor to tell virtio_find_vqs() to spread MSI interrupts across CPUs: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/block/virtio_blk.c#n609 The core blk-mq code has the blk_mq_virtio_map_queues() function to map block layer queues to virtqueues: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/blk-mq-virtio.c#n24 virtio_net.ko manually sets virtqueue affinity: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/virtio_net.c#n2283 virtio_net.ko tells the core net subsystem about queues using netif_set_real_num_tx_queues() and then skbs are mapped to queues by common code: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/dev.c#n4079 Stefan
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization