On Tue, May 21, 2019 at 2:51 AM Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> wrote: > > Dan Williams <dan.j.williams@xxxxxxxxx> writes: > > > On Mon, May 13, 2019 at 9:46 PM Aneesh Kumar K.V > > <aneesh.kumar@xxxxxxxxxxxxx> wrote: > >> > >> On 5/14/19 9:42 AM, Dan Williams wrote: > >> > On Mon, May 13, 2019 at 9:05 PM Aneesh Kumar K.V > >> > <aneesh.kumar@xxxxxxxxxxxxx> wrote: > >> >> > >> >> On 5/14/19 9:28 AM, Dan Williams wrote: > >> >>> On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V > >> >>> <aneesh.kumar@xxxxxxxxxxxxx> wrote: > >> >>>> > >> >>>> The nfpn related change is needed to fix the kernel message > >> >>>> > >> >>>> "number of pfns truncated from 2617344 to 163584" > >> >>>> > >> >>>> The change makes sure the nfpns stored in the superblock is right value. > >> >>>> > >> >>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> > >> >>>> --- > >> >>>> drivers/nvdimm/pfn_devs.c | 6 +++--- > >> >>>> drivers/nvdimm/region_devs.c | 8 ++++---- > >> >>>> 2 files changed, 7 insertions(+), 7 deletions(-) > >> >>>> > >> >>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > >> >>>> index 347cab166376..6751ff0296ef 100644 > >> >>>> --- a/drivers/nvdimm/pfn_devs.c > >> >>>> +++ b/drivers/nvdimm/pfn_devs.c > >> >>>> @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > >> >>>> * when populating the vmemmap. This *should* be equal to > >> >>>> * PMD_SIZE for most architectures. > >> >>>> */ > >> >>>> - offset = ALIGN(start + reserve + 64 * npfns, > >> >>>> - max(nd_pfn->align, PMD_SIZE)) - start; > >> >>>> + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, > >> >>>> + max(nd_pfn->align, PMD_SIZE)) - start; > >> >>> > >> >>> No, I think we need to record the page-size into the superblock format > >> >>> otherwise this breaks in debug builds where the struct-page size is > >> >>> extended. > >> >>> > >> >>>> } else if (nd_pfn->mode == PFN_MODE_RAM) > >> >>>> offset = ALIGN(start + reserve, nd_pfn->align) - start; > >> >>>> else > >> >>>> @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > >> >>>> return -ENXIO; > >> >>>> } > >> >>>> > >> >>>> - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; > >> >>>> + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; > >> >>> > >> >>> Similar comment, if the page size is variable then the superblock > >> >>> needs to explicitly account for it. > >> >>> > >> >> > >> >> PAGE_SIZE is not really variable. What we can run into is the issue you > >> >> mentioned above. The size of struct page can change which means the > >> >> reserved space for keeping vmemmap in device may not be sufficient for > >> >> certain kernel builds. > >> >> > >> >> I was planning to add another patch that fails namespace init if we > >> >> don't have enough space to keep the struct page. > >> >> > >> >> Why do you suggest we need to have PAGE_SIZE as part of pfn superblock? > >> > > >> > So that the kernel has a chance to identify cases where the superblock > >> > it is handling was created on a system with different PAGE_SIZE > >> > assumptions. > >> > > >> > >> The reason to do that is we don't have enough space to keep struct page > >> backing the total number of pfns? If so, what i suggested above should > >> handle that. > >> > >> or are you finding any other reason why we should fail a namespace init > >> with a different PAGE_SIZE value? > > > > I want the kernel to be able to start understand cross-architecture > > and cross-configuration geometries. Which to me means incrementing the > > info-block version and recording PAGE_SIZE and sizeof(struct page) in > > the info-block directly. > > > >> My another patch handle the details w.r.t devdax alignment for which > >> devdax got created with PAGE_SIZE 4K but we are now trying to load that > >> in a kernel with PAGE_SIZE 64k. > > > > Sure, but what about the reverse? These info-block format assumptions > > are as fundamental as the byte-order of the info-block, it needs to be > > cross-arch compatible and the x86 assumptions need to be fully lifted. > > Something like the below (Not tested). I am not sure what we will init the page_size > for minor version < 3. This will mark the namespace disabled if the > PAGE_SIZE and sizeof(struct page) doesn't match with the values used > during namespace create. Yes, this is on the right track. I would special-case page_size == 0 as 4096 and page_struct_size == 0 as 64. If either of those is non-zero then the info-block version needs to be revved and it needs to be crafted to make older kernels fail to parse it. There was an earlier attempt to implement minimum info-block versions here: https://lore.kernel.org/lkml/155000670159.348031.17631616775326330606.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ ...but that was dropped in favor of the the "sub-section" patches.