On Wed, Oct 30, 2024 at 05:17:25PM -0700, John Hubbard wrote: > On 10/30/24 5:02 PM, Jason Gunthorpe wrote: > > On Wed, Oct 30, 2024 at 11:34:49AM -0700, John Hubbard wrote: > > > > > From a very high level design perspective, it's not yet clear to me > > > that there is either a "preferred" or "not recommended" aspect to > > > pinning in batches vs. all at once here, as long as one stays > > > below the type (int, long, unsigned...) limits of the API. Batching > > > seems like what you do if the internal implementation is crippled > > > and unable to meet its API requirements. So the fact that many > > > callers do batching is sort of "tail wags dog". > > > > No.. all things need to do batching because nothing should be storing > > a linear struct page array that is so enormous. That is going to > > create vmemap pressure that is not desirable. > > Are we talking about the same allocation size here? It's not 2GB. It > is enough folio pointers to cover 2GB of memory, so 4MB. Is 2GB a hard limit? I was expecting this was a range that had upper bounds of 100GB's like for rdma.. Then it is 400MB, and yeah, that is not great. > That high level guidance makes sense, but here we are attempting only > a 4MB physically contiguous allocation, and if larger than that, then > it goes to vmalloc() which is merely virtually contiguous. AFAIK any contiguous allocation beyond 4K basically doesn't work reliably in a server environment due to fragmentation. So you are always using the vmemap.. > I'm writing this because your adjectives make me suspect that you > are referring to a 2GB allocation. But this is orders of magnitude > smaller. Even 4MB I would wonder about getting it split to PAGE_SIZE chunks instead of vmemmap, but I don't know what it is being used for. Jason