Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast()

Yongji Xie <xieyongji@xxxxxxxxxxxxx> · Tue, 10 Aug 2021 15:43:56 +0800

On Tue, Aug 10, 2021 at 11:02 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
>
>
> 在 2021/8/9 下午1:56, Yongji Xie 写道:
> > On Thu, Aug 5, 2021 at 9:31 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >>
> >> 在 2021/8/5 下午8:34, Yongji Xie 写道:
> >>>> My main point, though, is that if you've already got something else
> >>>> keeping track of the actual addresses, then the way you're using an
> >>>> iova_domain appears to be something you could do with a trivial bitmap
> >>>> allocator. That's why I don't buy the efficiency argument. The main
> >>>> design points of the IOVA allocator are to manage large address spaces
> >>>> while trying to maximise spatial locality to minimise the underlying
> >>>> pagetable usage, and allocating with a flexible limit to support
> >>>> multiple devices with different addressing capabilities in the same
> >>>> address space. If none of those aspects are relevant to the use-case -
> >>>> which AFAICS appears to be true here - then as a general-purpose
> >>>> resource allocator it's rubbish and has an unreasonably massive memory
> >>>> overhead and there are many, many better choices.
> >>>>
> >>> OK, I get your point. Actually we used the genpool allocator in the
> >>> early version. Maybe we can fall back to using it.
> >>
> >> I think maybe you can share some perf numbers to see how much
> >> alloc_iova_fast() can help.
> >>
> > I did some fio tests[1] with a ram-backend vduse block device[2].
> >
> > Following are some performance data:
> >
> >                              numjobs=1   numjobs=2    numjobs=4   numjobs=8
> > iova_alloc_fast    145k iops      265k iops      514k iops      758k iops
> >
> > iova_alloc            137k iops     170k iops      128k iops      113k iops
> >
> > gen_pool_alloc   143k iops      270k iops      458k iops      521k iops
> >
> > The iova_alloc_fast() has the best performance since we always hit the
> > per-cpu cache. Regardless of the per-cpu cache, the genpool allocator
> > should be better than the iova allocator.
>
>
> I think we see convincing numbers for using iova_alloc_fast() than the
> gen_poll_alloc() (45% improvement on job=8).
>

Yes, so alloc_iova_fast() still seems to be the best choice based on
performance considerations.

Hi Robin, any comments?

Thanks,
Yongji