On Tue, May 18, 2021 at 10:28 AM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > > On 5/5/21 11:36 PM, Joao Martins wrote: > > On 5/5/21 11:20 PM, Dan Williams wrote: > >> On Wed, May 5, 2021 at 12:50 PM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > >>> On 5/5/21 7:44 PM, Dan Williams wrote: > >>>> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > >>>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h > >>>>> index b46f63dcaed3..bb28d82dda5e 100644 > >>>>> --- a/include/linux/memremap.h > >>>>> +++ b/include/linux/memremap.h > >>>>> @@ -114,6 +114,7 @@ struct dev_pagemap { > >>>>> struct completion done; > >>>>> enum memory_type type; > >>>>> unsigned int flags; > >>>>> + unsigned long align; > >>>> > >>>> I think this wants some kernel-doc above to indicate that non-zero > >>>> means "use compound pages with tail-page dedup" and zero / PAGE_SIZE > >>>> means "use non-compound base pages". > > [...] > > >>>> The non-zero value must be > >>>> PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE. > >>>> Hmm, maybe it should be an > >>>> enum: > >>>> > >>>> enum devmap_geometry { > >>>> DEVMAP_PTE, > >>>> DEVMAP_PMD, > >>>> DEVMAP_PUD, > >>>> } > >>>> > >>> I suppose a converter between devmap_geometry and page_size would be needed too? And maybe > >>> the whole dax/nvdimm align values change meanwhile (as a followup improvement)? > >> > >> I think it is ok for dax/nvdimm to continue to maintain their align > >> value because it should be ok to have 4MB align if the device really > >> wanted. However, when it goes to map that alignment with > >> memremap_pages() it can pick a mode. For example, it's already the > >> case that dax->align == 1GB is mapped with DEVMAP_PTE today, so > >> they're already separate concepts that can stay separate. > >> > > Gotcha. > > I am reconsidering part of the above. In general, yes, the meaning of devmap @align > represents a slightly different variation of the device @align i.e. how the metadata is > laid out **but** regardless of what kind of page table entries we use vmemmap. > > By using DEVMAP_PTE/PMD/PUD we might end up 1) duplicating what nvdimm/dax already > validates in terms of allowed device @align values (i.e. PAGE_SIZE, PMD_SIZE and PUD_SIZE) > 2) the geometry of metadata is very much tied to the value we pick to @align at namespace > provisioning -- not the "align" we might use at mmap() perhaps that's what you referred > above? -- and 3) the value of geometry actually derives from dax device @align because we > will need to create compound pages representing a page size of @align value. > > Using your example above: you're saying that dax->align == 1G is mapped with DEVMAP_PTEs, > in reality the vmemmap is populated with PMDs/PUDs page tables (depending on what archs > decide to do at vmemmap_populate()) and uses base pages as its metadata regardless of what > device @align. In reality what we want to convey in @geometry is not page table sizes, but > just the page size used for the vmemmap of the dax device. Good point, the names "PTE, PMD, PUD" imply the hardware mapping size, not the software compound page size. > Additionally, limiting its > value might not be desirable... if tomorrow Linux for some arch supports dax/nvdimm > devices with 4M align or 64K align, the value of @geometry will have to reflect the 4M to > create compound pages of order 10 for the said vmemmap. > > I am going to wait until you finish reviewing the remaining four patches of this series, > but maybe this is a simple misnomer (s/align/geometry/) with a comment but without > DEVMAP_{PTE,PMD,PUD} enum part? Or perhaps its own struct with a value and enum a > setter/getter to audit its value? Thoughts? I do see what you mean about the confusion DEVMAP_{PTE,PMD,PUD} introduces, but I still think the device-dax align and the organization of the 'struct page' metadata are distinct concepts. So I'm happy with any color of the bikeshed as long as the 2 concepts are distinct. How about calling it "compound_page_order"? Open to other ideas...