Re: [RFC 00/16] padata, vfio, sched: Multithreaded VFIO page pinning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 10, 2022 at 08:17:51PM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 10, 2022 at 05:27:25PM -0500, Daniel Jordan wrote:
> 
> > > > Pinning itself, the only thing being optimized, improves 8.5x in that
> > > > experiment, bringing the time from 1.8 seconds to .2 seconds.  That's a
> > > > significant savings IMHO
> > > 
> > > And here is where I suspect we'd get similar results from folio's
> > > based on the unpin performance uplift we already saw.
> > > 
> > > As long as PUP doesn't have to COW its work is largely proportional to
> > > the number of struct pages it processes, so we should be expecting an
> > > upper limit of 512x gains on the PUP alone with foliation.
> > >
> > > This is in line with what we saw with the prior unpin work.
> > 
> > "in line with what we saw"  Not following.  The unpin work had two
> > optimizations, I think, 4.5x and 3.5x which together give 16x.  Why is
> > that in line with the potential gains from pup?
> 
> It is the same basic issue, doing extra work, dirtying extra memory..

Ok, gotcha.

> I don't know of other users that use such huge memory sizes this would
> matter, besides a VMM..

Right, all the VMMs out there that use vfio.

> > My assumption going into this series was that multithreading VFIO page
> > pinning in the kernel was a viable way forward given the positive
> > feedback I got from the VFIO maintainer last time I posted this, which
> > was admittedly a while ago, and I've since been focused on the other
> > parts of this series rather than what's been happening in the mm lately.
> > Anyway, your arguments are reasonable, so I'll go take a look at some of
> > these optimizations and see where I get.
> 
> Well, it is not *unreasonable* it just doesn't seem compelling to me
> yet.
> 
> Especially since we are not anywhere close to the limit of single
> threaded performance. Aside from GUP, the whole way we transfer the
> physical pages into the iommu is just begging for optimizations
> eg Matthew's struct phyr needs to be an input and output at the iommu
> layer to make this code really happy.

/nods/  There are other ways forward.  As I say, I'll take a look.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux