Re: [RFC 5/5] virtiofs: perform DMA operations out of the spinlock

Jason Wang <jasowang@xxxxxxxxxx> · Thu, 23 Jan 2025 09:56:01 +0800

On Thu, Jan 23, 2025 at 12:32 AM Eugenio Pérez <eperezma@xxxxxxxxxx> wrote:
>
> This is useful for some setups like swiotlb or VDUSE where the DMA
> operations are expensive and/or need to be performed with a write lock.
>
> After applying this patch, fio read test goes from 1201MiB/s to 1211MiB/s.

The difference is too small to differentiate it from the noise.

I would suggest to test with different setups.

1) SWIOTLB 2) VDUSE

Note that SWIOTLB will do bouncing even for DMA_FROM_DEVICE, so I
think we may see better performance there.

And we need to try with different request size, I did a similar patch
for virtio-blk and I see better performance for large request like 1M
etc.

Thanks

>
> Signed-off-by: Eugenio Pérez <eperezma@xxxxxxxxxx>
> ---
>  drivers/virtio/virtio_ring.c |  2 ++
>  fs/fuse/virtio_fs.c          | 25 +++++++++++++++++++++++--
>  2 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index e49912fa77c5..eb22bfcb9100 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -580,6 +580,7 @@ int virtqueue_map_sgs(struct virtqueue *_vq,
>                                 goto unmap_release;
>
>                         sg_dma_address(sg) = addr;
> +                       sg_dma_len(sg) = sg->length;
>                         mapped_sg++;
>                 }
>         }
> @@ -592,6 +593,7 @@ int virtqueue_map_sgs(struct virtqueue *_vq,
>                                 goto unmap_release;
>
>                         sg_dma_address(sg) = addr;
> +                       sg_dma_len(sg) = sg->length;
>                         mapped_sg++;
>                 }
>         }
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 1344c5782a7c..2b558b05d0f8 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -836,8 +836,21 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
>
>         /* End requests */
>         list_for_each_entry_safe(req, next, &reqs, list) {
> +               struct scatterlist *stack_sgs[6];
> +               struct scatterlist **sgs = stack_sgs;
> +               unsigned int total_sgs = req->out_sgs + req->in_sgs;
> +
>                 list_del_init(&req->list);
>
> +               /* TODO replace magic 6 by a macro */
> +               if (total_sgs > 6)
> +                       sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), GFP_ATOMIC);
> +
> +               for (unsigned int i = 0; i < total_sgs; ++i)
> +                       sgs[i] = &req->sg[i];
> +
> +               virtqueue_unmap_sgs(vq, sgs, req->out_sgs, req->in_sgs);
> +
>                 /* blocking async request completes in a worker context */
>                 if (req->args->may_block) {
>                         struct virtio_fs_req_work *w;
> @@ -850,6 +863,9 @@ static void virtio_fs_requests_done_work(struct work_struct *work)
>                 } else {
>                         virtio_fs_request_complete(req, fsvq);
>                 }
> +
> +               if (sgs != stack_sgs)
> +                       kfree(sgs);
>         }
>
>         /* Try to push previously queued requests, as the queue might no longer be full */
> @@ -1426,6 +1442,11 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
>                 sgs[i] = &req->sg[i];
>         WARN_ON(req->out_sgs + req->in_sgs != total_sgs);
>
> +       // TODO can we change this ptr out of the lock?
> +       vq = fsvq->vq;
> +       // TODO handle this and following errors
> +       ret = virtqueue_map_sgs(vq, sgs, req->out_sgs, req->in_sgs);
> +       BUG_ON(ret < 0);
>         spin_lock(&fsvq->lock);
>
>         if (!fsvq->connected) {
> @@ -1434,8 +1455,8 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
>                 goto out;
>         }
>
> -       vq = fsvq->vq;
> -       ret = virtqueue_add_sgs(vq, sgs, req->out_sgs, req->in_sgs, req, GFP_ATOMIC);
> +       ret = virtqueue_add_sgs_premapped(vq, sgs, req->out_sgs,
> +                                         req->in_sgs, req, GFP_ATOMIC);
>         if (ret < 0) {
>                 spin_unlock(&fsvq->lock);
>                 goto out;
> --
> 2.48.1
>