On Tue, Aug 10, 2021 at 11:02 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: > > > 在 2021/8/9 下午1:56, Yongji Xie 写道: > > On Thu, Aug 5, 2021 at 9:31 PM Jason Wang <jasowang@xxxxxxxxxx> wrote: > >> > >> 在 2021/8/5 下午8:34, Yongji Xie 写道: > >>>> My main point, though, is that if you've already got something else > >>>> keeping track of the actual addresses, then the way you're using an > >>>> iova_domain appears to be something you could do with a trivial bitmap > >>>> allocator. That's why I don't buy the efficiency argument. The main > >>>> design points of the IOVA allocator are to manage large address spaces > >>>> while trying to maximise spatial locality to minimise the underlying > >>>> pagetable usage, and allocating with a flexible limit to support > >>>> multiple devices with different addressing capabilities in the same > >>>> address space. If none of those aspects are relevant to the use-case - > >>>> which AFAICS appears to be true here - then as a general-purpose > >>>> resource allocator it's rubbish and has an unreasonably massive memory > >>>> overhead and there are many, many better choices. > >>>> > >>> OK, I get your point. Actually we used the genpool allocator in the > >>> early version. Maybe we can fall back to using it. > >> > >> I think maybe you can share some perf numbers to see how much > >> alloc_iova_fast() can help. > >> > > I did some fio tests[1] with a ram-backend vduse block device[2]. > > > > Following are some performance data: > > > > numjobs=1 numjobs=2 numjobs=4 numjobs=8 > > iova_alloc_fast 145k iops 265k iops 514k iops 758k iops > > > > iova_alloc 137k iops 170k iops 128k iops 113k iops > > > > gen_pool_alloc 143k iops 270k iops 458k iops 521k iops > > > > The iova_alloc_fast() has the best performance since we always hit the > > per-cpu cache. Regardless of the per-cpu cache, the genpool allocator > > should be better than the iova allocator. > > > I think we see convincing numbers for using iova_alloc_fast() than the > gen_poll_alloc() (45% improvement on job=8). > Yes, so alloc_iova_fast() still seems to be the best choice based on performance considerations. Hi Robin, any comments? Thanks, Yongji