Dan Williams <dan.j.williams@xxxxxxxxx> writes: > On Mon, May 13, 2019 at 9:46 PM Aneesh Kumar K.V > <aneesh.kumar@xxxxxxxxxxxxx> wrote: >> >> On 5/14/19 9:42 AM, Dan Williams wrote: >> > On Mon, May 13, 2019 at 9:05 PM Aneesh Kumar K.V >> > <aneesh.kumar@xxxxxxxxxxxxx> wrote: >> >> >> >> On 5/14/19 9:28 AM, Dan Williams wrote: >> >>> On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V >> >>> <aneesh.kumar@xxxxxxxxxxxxx> wrote: >> >>>> >> >>>> The nfpn related change is needed to fix the kernel message >> >>>> >> >>>> "number of pfns truncated from 2617344 to 163584" >> >>>> >> >>>> The change makes sure the nfpns stored in the superblock is right value. >> >>>> >> >>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> >> >>>> --- >> >>>> drivers/nvdimm/pfn_devs.c | 6 +++--- >> >>>> drivers/nvdimm/region_devs.c | 8 ++++---- >> >>>> 2 files changed, 7 insertions(+), 7 deletions(-) >> >>>> >> >>>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c >> >>>> index 347cab166376..6751ff0296ef 100644 >> >>>> --- a/drivers/nvdimm/pfn_devs.c >> >>>> +++ b/drivers/nvdimm/pfn_devs.c >> >>>> @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) >> >>>> * when populating the vmemmap. This *should* be equal to >> >>>> * PMD_SIZE for most architectures. >> >>>> */ >> >>>> - offset = ALIGN(start + reserve + 64 * npfns, >> >>>> - max(nd_pfn->align, PMD_SIZE)) - start; >> >>>> + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, >> >>>> + max(nd_pfn->align, PMD_SIZE)) - start; >> >>> >> >>> No, I think we need to record the page-size into the superblock format >> >>> otherwise this breaks in debug builds where the struct-page size is >> >>> extended. >> >>> >> >>>> } else if (nd_pfn->mode == PFN_MODE_RAM) >> >>>> offset = ALIGN(start + reserve, nd_pfn->align) - start; >> >>>> else >> >>>> @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) >> >>>> return -ENXIO; >> >>>> } >> >>>> >> >>>> - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; >> >>>> + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; >> >>> >> >>> Similar comment, if the page size is variable then the superblock >> >>> needs to explicitly account for it. >> >>> >> >> >> >> PAGE_SIZE is not really variable. What we can run into is the issue you >> >> mentioned above. The size of struct page can change which means the >> >> reserved space for keeping vmemmap in device may not be sufficient for >> >> certain kernel builds. >> >> >> >> I was planning to add another patch that fails namespace init if we >> >> don't have enough space to keep the struct page. >> >> >> >> Why do you suggest we need to have PAGE_SIZE as part of pfn superblock? >> > >> > So that the kernel has a chance to identify cases where the superblock >> > it is handling was created on a system with different PAGE_SIZE >> > assumptions. >> > >> >> The reason to do that is we don't have enough space to keep struct page >> backing the total number of pfns? If so, what i suggested above should >> handle that. >> >> or are you finding any other reason why we should fail a namespace init >> with a different PAGE_SIZE value? > > I want the kernel to be able to start understand cross-architecture > and cross-configuration geometries. Which to me means incrementing the > info-block version and recording PAGE_SIZE and sizeof(struct page) in > the info-block directly. > >> My another patch handle the details w.r.t devdax alignment for which >> devdax got created with PAGE_SIZE 4K but we are now trying to load that >> in a kernel with PAGE_SIZE 64k. > > Sure, but what about the reverse? These info-block format assumptions > are as fundamental as the byte-order of the info-block, it needs to be > cross-arch compatible and the x86 assumptions need to be fully lifted. Something like the below (Not tested). I am not sure what we will init the page_size for minor version < 3. This will mark the namespace disabled if the PAGE_SIZE and sizeof(struct page) doesn't match with the values used during namespace create. diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h index dde9853453d3..d6e0933d0dd4 100644 --- a/drivers/nvdimm/pfn.h +++ b/drivers/nvdimm/pfn.h @@ -36,6 +36,9 @@ struct nd_pfn_sb { __le32 end_trunc; /* minor-version-2 record the base alignment of the mapping */ __le32 align; + /* minor-version-3 record the page size and struct page size */ + __le32 page_size; + __le32 page_struct_size; u8 padding[4000]; __le64 checksum; }; diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 6f9f78858018..bbc1d792d7f3 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -477,6 +477,15 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) if (__le16_to_cpu(pfn_sb->version_minor) < 2) pfn_sb->align = 0; + if (__le16_to_cpu(pfn_sb->version_minor) < 3) { + /* + * For a large part we use PAGE_SIZE. But we + * do have some accounting code using SIZE_4K. + */ + pfn_sb->page_size = cpu_to_le32(PAGE_SIZE); + pfn_sb->page_struct_size = cpu_to_le32(64); + } + switch (le32_to_cpu(pfn_sb->mode)) { case PFN_MODE_RAM: case PFN_MODE_PMEM: @@ -504,6 +513,12 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) return -EOPNOTSUPP; } + if (le32_to_cpu(pfn_sb->page_size) != PAGE_SIZE) + return -EOPNOTSUPP; + + if (le32_to_cpu(pfn_sb->page_struct_size) != sizeof(struct page)) + return -EOPNOTSUPP; + if (!nd_pfn->uuid) { /* * When probing a namepace via nd_pfn_probe() the uuid @@ -798,7 +813,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) memcpy(pfn_sb->uuid, nd_pfn->uuid, 16); memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16); pfn_sb->version_major = cpu_to_le16(1); - pfn_sb->version_minor = cpu_to_le16(2); + pfn_sb->version_minor = cpu_to_le16(3); pfn_sb->start_pad = cpu_to_le32(start_pad); pfn_sb->end_trunc = cpu_to_le32(end_trunc); pfn_sb->align = cpu_to_le32(nd_pfn->align);