Re: [PATCH] fuse: cleanup request queuing towards virtiofs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 05, 2024 at 10:40:44AM +0000, Peter-Jan Gootzen wrote:
> On Wed, 2024-05-29 at 14:32 -0400, Stefan Hajnoczi wrote:
> > On Wed, May 29, 2024 at 05:52:07PM +0200, Miklos Szeredi wrote:
> > > Virtiofs has its own queing mechanism, but still requests are first
> > > queued
> > > on fiq->pending to be immediately dequeued and queued onto the
> > > virtio
> > > queue.
> > > 
> > > The queuing on fiq->pending is unnecessary and might even have some
> > > performance impact due to being a contention point.
> > > 
> > > Forget requests are handled similarly.
> > > 
> > > Move the queuing of requests and forgets into the fiq->ops->*.
> > > fuse_iqueue_ops are renamed to reflect the new semantics.
> > > 
> > > Signed-off-by: Miklos Szeredi <mszeredi@xxxxxxxxxx>
> > > ---
> > >  fs/fuse/dev.c       | 159 ++++++++++++++++++++++++-----------------
> > > ---
> > >  fs/fuse/fuse_i.h    |  19 ++----
> > >  fs/fuse/virtio_fs.c |  41 ++++--------
> > >  3 files changed, 106 insertions(+), 113 deletions(-)
> > 
> > This is a little scary but I can't think of a scenario where directly
> > dispatching requests to virtqueues is a problem.
> > 
> > Is there someone who can run single and multiqueue virtiofs
> > performance
> > benchmarks?
> > 
> > Reviewed-by: Stefan Hajnoczi <stefanha@xxxxxxxxxx>
> 
> I ran some tests and experiments on the patch (on top of v6.10-rc2) with
> our multi-queue capable virtio-fs device. No issues were found.
> 
> Experimental system setup (which is not the fastest possible setup nor
> the most optimized setup!):
> # Host:
>    - Dell PowerEdge R7525
>    - CPU: 2x AMD EPYC 7413 24-Core
>    - VM: QEMU KVM with 24 cores, vCPUs locked to the NUMA nodes on which
> the DPU is attached. VFIO-pci device to passthrough the DPU.           
> Running a default x86_64 ext4 buildroot with fio installed.
> # Virtio-fs device:
>    - BlueField-3 DPU
>    - CPU: ARM Cortex-A78AE, 16 cores
>    - One thread per queue, each busy polling on one request queue
>    - Each queue is 1024 descriptors deep
> # Workload (deviations are specified in the table):
>    - fio 3.34
>    - sequential read
>    - ioengine=io_uring, single 4GiB file, iodepth=128, bs=256KiB,    
> runtime=30s, ramp_time=10s, direct=1
>    - T is the number of threads (numjobs=T with thread=1)
>    - Q is the number of request queues
> 
> | Workload           | Before patch | After patch |
> | ------------------ | ------------ | ----------- |
> | T=1 Q=1            | 9216MiB/s    | 9201MiB/s   |
> | T=2 Q=2            | 10.8GiB/s    | 10.7GiB/s   |
> | T=4 Q=4            | 12.6GiB/s    | 12.2GiB/s   |
> | T=8 Q=8            | 19.5GiB/s    | 19.7GiB/s   |
> | T=16 Q=1           | 9451MiB/s    | 9558MiB/s   |
> | T=16 Q=2           | 13.5GiB/s    | 13.4GiB/s   |
> | T=16 Q=4           | 11.8GiB/s    | 11.4GiB/s   |
> | T=16 Q=8           | 11.1GiB/s    | 10.8GiB/s   |
> | T=24 Q=24          | 26.5GiB/s    | 26.5GiB/s   |
> | T=24 Q=24 24 files | 26.5GiB/s    | 26.6GiB/s   |
> | T=24 Q=24 4k       | 948MiB/s     | 955MiB/s    |
> 
> Averaging out those results, the difference is within a reasonable
> margin of a error (less than 1%). So in this setup's
> case we see no difference in performance.
> However if the virtio-fs device was more optimized, e.g. if it didn't
> copy the data to its memory, then the bottleneck could possibly be on
> the driver-side and this patch could show some benefit at those higher
> message rates.
> 
> So although I would have hoped for some performance increase already
> with this setup, I still think this is a good patch and a logical
> optimization for high performance virtio-fs devices that might show a
> benefit in the future.
> 
> Tested-by: Peter-Jan Gootzen <pgootzen@xxxxxxxxxx>
> Reviewed-by: Peter-Jan Gootzen <pgootzen@xxxxxxxxxx>

Thank you!

Stefan

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux