On 6/7/21 9:47 PM, Joao Martins wrote: > On 6/7/21 9:17 PM, Dan Williams wrote: >> On Tue, May 18, 2021 at 10:28 AM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>> >>> On 5/5/21 11:36 PM, Joao Martins wrote: >>>> On 5/5/21 11:20 PM, Dan Williams wrote: >>>>> On Wed, May 5, 2021 at 12:50 PM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>>>>> On 5/5/21 7:44 PM, Dan Williams wrote: >>>>>>> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >>>>>>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >>>>>>>> index b46f63dcaed3..bb28d82dda5e 100644 >>>>>>>> --- a/include/linux/memremap.h >>>>>>>> +++ b/include/linux/memremap.h >>>>>>>> @@ -114,6 +114,7 @@ struct dev_pagemap { >>>>>>>> struct completion done; >>>>>>>> enum memory_type type; >>>>>>>> unsigned int flags; >>>>>>>> + unsigned long align; >>>>>>> >>>>>>> I think this wants some kernel-doc above to indicate that non-zero >>>>>>> means "use compound pages with tail-page dedup" and zero / PAGE_SIZE >>>>>>> means "use non-compound base pages". >>> >>> [...] >>> >>>>>>> The non-zero value must be >>>>>>> PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE. >>>>>>> Hmm, maybe it should be an >>>>>>> enum: >>>>>>> >>>>>>> enum devmap_geometry { >>>>>>> DEVMAP_PTE, >>>>>>> DEVMAP_PMD, >>>>>>> DEVMAP_PUD, >>>>>>> } >>>>>>> >>>>>> I suppose a converter between devmap_geometry and page_size would be needed too? And maybe >>>>>> the whole dax/nvdimm align values change meanwhile (as a followup improvement)? >>>>> >>>>> I think it is ok for dax/nvdimm to continue to maintain their align >>>>> value because it should be ok to have 4MB align if the device really >>>>> wanted. However, when it goes to map that alignment with >>>>> memremap_pages() it can pick a mode. For example, it's already the >>>>> case that dax->align == 1GB is mapped with DEVMAP_PTE today, so >>>>> they're already separate concepts that can stay separate. >>>>> >>>> Gotcha. >>> >>> I am reconsidering part of the above. In general, yes, the meaning of devmap @align >>> represents a slightly different variation of the device @align i.e. how the metadata is >>> laid out **but** regardless of what kind of page table entries we use vmemmap. >>> >>> By using DEVMAP_PTE/PMD/PUD we might end up 1) duplicating what nvdimm/dax already >>> validates in terms of allowed device @align values (i.e. PAGE_SIZE, PMD_SIZE and PUD_SIZE) >>> 2) the geometry of metadata is very much tied to the value we pick to @align at namespace >>> provisioning -- not the "align" we might use at mmap() perhaps that's what you referred >>> above? -- and 3) the value of geometry actually derives from dax device @align because we >>> will need to create compound pages representing a page size of @align value. >>> >>> Using your example above: you're saying that dax->align == 1G is mapped with DEVMAP_PTEs, >>> in reality the vmemmap is populated with PMDs/PUDs page tables (depending on what archs >>> decide to do at vmemmap_populate()) and uses base pages as its metadata regardless of what >>> device @align. In reality what we want to convey in @geometry is not page table sizes, but >>> just the page size used for the vmemmap of the dax device. >> >> Good point, the names "PTE, PMD, PUD" imply the hardware mapping size, >> not the software compound page size. >> >>> Additionally, limiting its >>> value might not be desirable... if tomorrow Linux for some arch supports dax/nvdimm >>> devices with 4M align or 64K align, the value of @geometry will have to reflect the 4M to >>> create compound pages of order 10 for the said vmemmap. >>> >>> I am going to wait until you finish reviewing the remaining four patches of this series, >>> but maybe this is a simple misnomer (s/align/geometry/) with a comment but without >>> DEVMAP_{PTE,PMD,PUD} enum part? Or perhaps its own struct with a value and enum a >>> setter/getter to audit its value? Thoughts? >> >> I do see what you mean about the confusion DEVMAP_{PTE,PMD,PUD} >> introduces, but I still think the device-dax align and the >> organization of the 'struct page' metadata are distinct concepts. So >> I'm happy with any color of the bikeshed as long as the 2 concepts are >> distinct. How about calling it "compound_page_order"? Open to other >> ideas... >> > I actually like the name of @geometry. The only thing better would be @vmemmap_geometry > solely because it makes it clear that its the vmemmap that we are talking about -- but > might be unnecssarily verbose. And I still agree that is separate concept that should be > named differently *at least*. > > But naming aside, I was trying to get at was to avoid a second geometry value validation > i.e. to be validated the value and set with a value such as DEVMAP_PTE, DEVMAP_PMD and > DEVMAP_PUD. Sorry my english keeps getting broken, I meant this instead: But naming aside, what I am trying to get at is to remove the second geometry value validation i.e. for @geometry to not be validated a second time to be set to DEVMAP_PTE, DEVMAP_PMD or DEVMAP_PUD. > That to me sounds a little redundant, when the geometry value depends on what > align is going to be used from. Here my metnion of @align refers to what's used to create > the dax device, not the mmap() align [which can be lower than the device one]. The dax > device align is the one used to decide whether to use PTEs, PMDs or PUDs at dax fault handler. > > So separate concepts, but still its value dependent on one another. At least unless we > want to allow geometry values different than those set by --align as Jane suggested. > And I should add: I can maintain the DEVMAP_* enum values, but then these will need to be changed in tandem anytime a new @align value is supported. Or instead we use the name @geometry albeit with still as an unsigned long type . Or rather than an unsigned long perhaps making another type and its value obtained/changed with getter/setter.