Jason Gunthorpe wrote: > On Mon, Oct 17, 2022 at 01:06:25PM -0700, Dan Williams wrote: > > Alistair Popple wrote: > > > > > > Dan Williams <dan.j.williams@xxxxxxxxx> writes: > > > > > > [...] > > > > > > > +/** > > > > + * pgmap_request_folios - activate an contiguous span of folios in @pgmap > > > > + * @pgmap: host page map for the folio array > > > > + * @folio: start of the folio list, all subsequent folios have same folio_size() > > > > + * > > > > + * Caller is responsible for @pgmap remaining live for the duration of > > > > + * this call. Caller is also responsible for not racing requests for the > > > > + * same folios. > > > > + */ > > > > +bool pgmap_request_folios(struct dev_pagemap *pgmap, struct folio *folio, > > > > + int nr_folios) > > > > > > All callers of pgmap_request_folios() I could see call this with > > > nr_folios == 1 and pgmap == folio_pgmap(folio). Could we remove the > > > redundant arguments and simplify it to > > > pgmap_request_folios(struct folio *folio)? > > > > The rationale for the @pgmap argument being separate is to make clear > > that the caller must assert that pgmap is alive before passing it to > > pgmap_request_folios(), and that @pgmap is constant for the span of > > folios requested. > > The api is kind of weird to take in a folio - it should take in the > offset in bytes from the start of the pgmap and "create" usable > non-free folios. Plumbing that is non-trivial. Recall that in the DAX case the filesystem can be sitting on top of a stack of device-mapper-dax devices routing to more than one pgmap, and the pgmap can be made up of discontiguous ranges. The filesystem knows the offset into the dax-device and then asks that device to translate that to a pfn. From pfn the optional (see CONFIG_FS_DAX_LIMITED) pgmap can be retrieved. Once DAX knows what pfn it wants, while holding a lock that prevents the pgmap from entering its dying state, it can request to pin that pfn. I arrived at the interface being folio based because the order of the pin(s) also needs to be conveyed and the order should be consistent. > A good design is that nothing has a struct folio * unless it can also > prove it has a non-zero refcount on it, and that is simply impossible > for any caller at this point. I agree that's a good design, and I think it is a bug to make this request on anything but a zero-refcount folio, but plumbing pgmap offsets to replace pfn_to_page() would require dax_direct_access() to shift from a pfn lookup to a pgmap + offset lookup. CONFIG_FS_DAX_LIMITED is something that needs to be deleted, but for now it supports DAX without a pgmap, where "limited" is just meant for XIP (eXecute-In-Place), no GUP support. Once that goes then dax_direct_access() could evolve into an interface that assumes a pgmap is present. Until then I think pgmap_request_folios() at least puts out the immediate fire of making people contend with oddly refcounted DAX pages.