Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 5, 2021 at 9:31 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
>
>
> 在 2021/8/5 下午8:34, Yongji Xie 写道:
> >> My main point, though, is that if you've already got something else
> >> keeping track of the actual addresses, then the way you're using an
> >> iova_domain appears to be something you could do with a trivial bitmap
> >> allocator. That's why I don't buy the efficiency argument. The main
> >> design points of the IOVA allocator are to manage large address spaces
> >> while trying to maximise spatial locality to minimise the underlying
> >> pagetable usage, and allocating with a flexible limit to support
> >> multiple devices with different addressing capabilities in the same
> >> address space. If none of those aspects are relevant to the use-case -
> >> which AFAICS appears to be true here - then as a general-purpose
> >> resource allocator it's rubbish and has an unreasonably massive memory
> >> overhead and there are many, many better choices.
> >>
> > OK, I get your point. Actually we used the genpool allocator in the
> > early version. Maybe we can fall back to using it.
>
>
> I think maybe you can share some perf numbers to see how much
> alloc_iova_fast() can help.
>

I did some fio tests[1] with a ram-backend vduse block device[2].

Following are some performance data:

                            numjobs=1   numjobs=2    numjobs=4   numjobs=8
iova_alloc_fast    145k iops      265k iops      514k iops      758k iops

iova_alloc            137k iops     170k iops      128k iops      113k iops

gen_pool_alloc   143k iops      270k iops      458k iops      521k iops

The iova_alloc_fast() has the best performance since we always hit the
per-cpu cache. Regardless of the per-cpu cache, the genpool allocator
should be better than the iova allocator.

[1] fio jobfile:

[global]
rw=randread
direct=1
ioengine=libaio
iodepth=16
time_based=1
runtime=60s
group_reporting
bs=4k
filename=/dev/vda
[job]
numjobs=..

[2]  $ qemu-storage-daemon \
      --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
      --monitor chardev=charmonitor \
      --blockdev
driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
\
      --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128

The qemu-storage-daemon can be builded based on the repo:
https://github.com/bytedance/qemu/tree/vduse-test.

Thanks,
Yongji




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux