Re: [PATCH] mm/gup: restore the ability to pin more than 2GB at a time

Jason Gunthorpe <jgg@xxxxxxxxxx> · Wed, 30 Oct 2024 21:25:52 -0300

On Wed, Oct 30, 2024 at 05:17:25PM -0700, John Hubbard wrote:
> On 10/30/24 5:02 PM, Jason Gunthorpe wrote:
> > On Wed, Oct 30, 2024 at 11:34:49AM -0700, John Hubbard wrote:
> > 
> > >  From a very high level design perspective, it's not yet clear to me
> > > that there is either a "preferred" or "not recommended" aspect to
> > > pinning in batches vs. all at once here, as long as one stays
> > > below the type (int, long, unsigned...) limits of the API. Batching
> > > seems like what you do if the internal implementation is crippled
> > > and unable to meet its API requirements. So the fact that many
> > > callers do batching is sort of "tail wags dog".
> > 
> > No.. all things need to do batching because nothing should be storing
> > a linear struct page array that is so enormous. That is going to
> > create vmemap pressure that is not desirable.
> 
> Are we talking about the same allocation size here? It's not 2GB. It
> is enough folio pointers to cover 2GB of memory, so 4MB.

Is 2GB a hard limit? I was expecting this was a range that had upper
bounds of 100GB's like for rdma.. Then it is 400MB, and yeah, that is
not great.

> That high level guidance makes sense, but here we are attempting only
> a 4MB physically contiguous allocation, and if larger than that, then
> it goes to vmalloc() which is merely virtually contiguous.

AFAIK any contiguous allocation beyond 4K basically doesn't work
reliably in a server environment due to fragmentation.

So you are always using the vmemap..

> I'm writing this because your adjectives make me suspect that you
> are referring to a 2GB allocation. But this is orders of magnitude
> smaller.

Even 4MB I would wonder about getting it split to PAGE_SIZE chunks
instead of vmemmap, but I don't know what it is being used for.

Jason