Jason Gunthorpe wrote: > On Thu, Sep 22, 2022 at 02:54:42PM -0700, Dan Williams wrote: > > > > I'm thinking broadly about how to make pgmap usable to all the other > > > drivers in a safe and robust way that makes some kind of logical sense. > > > > I think the API should be pgmap_folio_get() because, at least for DAX, > > the memory is already allocated. > > I would pick a name that has some logical connection to > ops->page_free() > > This function is starting a pairing where once it completes page_free > will eventually be called. Following Dave's note that this is an 'arbitration' mechanism I think request/release is more appropriate than alloc/free for what this is doing. > > > /** > > * pgmap_get_folio() - reference a folio in a live @pgmap by @pfn > > * @pgmap: live pgmap instance, caller ensures this does not race @pgmap death > > * @pfn: page frame number covered by @pgmap > > */ > > struct folio *pgmap_get_folio(struct dev_pagemap *pgmap, unsigned long pfn) > > { > > struct page *page; > > > > VM_WARN_ONCE(pgmap != xa_load(&pgmap_array, PHYS_PFN(phys))); > > > > if (WARN_ONCE(percpu_ref_is_dying(&pgmap->ref))) > > return NULL; > > This shouldn't be a WARN? It's a bug if someone calls this after killing the pgmap. I.e. the expectation is that the caller is synchronzing this. The only reason this isn't a VM_WARN_ONCE is because the sanity check is cheap, but I do not expect it to fire on anything but a development kernel. > > > page = pfn_to_page(pfn); > > return page_folio(page); > > } > > Yeah, makes sense to me, but I would do a len as well to amortize the > cost of all these checks.. > > > This does not create compound folios, that needs to be coordinated with > > the caller and likely needs an explicit > > Does it? What situations do you think the caller needs to coordinate > the folio size? Caller should call the function for each logical unit > of storage it wants to allocate from the pgmap.. The problem for fsdax is that it needs to gather all the PTEs, hold a lock to synchronize against events that would shatter a huge page, and then build up the compound folio metadata before inserting the PMD. So I think that flow is request all pfns, lock, fixup refcounts, build up compound folio, insert huge i_pages entry, unlock and install the pmd.