On Wed, Mar 20, 2019 at 8:09 PM Oliver <oohall@xxxxxxxxx> wrote: > > On Thu, Mar 21, 2019 at 7:57 AM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > On Wed, Mar 20, 2019 at 8:34 AM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > > > > On Wed, Mar 20, 2019 at 1:09 AM Aneesh Kumar K.V > > > <aneesh.kumar@xxxxxxxxxxxxx> wrote: > > > > > > > > Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> writes: > > > > > > > > > Dan Williams <dan.j.williams@xxxxxxxxx> writes: > > > > > > > > > >> > > > > >>> Now what will be page size used for mapping vmemmap? > > > > >> > > > > >> That's up to the architecture's vmemmap_populate() implementation. > > > > >> > > > > >>> Architectures > > > > >>> possibly will use PMD_SIZE mapping if supported for vmemmap. Now a > > > > >>> device-dax with struct page in the device will have pfn reserve area aligned > > > > >>> to PAGE_SIZE with the above example? We can't map that using > > > > >>> PMD_SIZE page size? > > > > >> > > > > >> IIUC, that's a different alignment. Currently that's handled by > > > > >> padding the reservation area up to a section (128MB on x86) boundary, > > > > >> but I'm working on patches to allow sub-section sized ranges to be > > > > >> mapped. > > > > > > > > > > I am missing something w.r.t code. The below code align that using nd_pfn->align > > > > > > > > > > if (nd_pfn->mode == PFN_MODE_PMEM) { > > > > > unsigned long memmap_size; > > > > > > > > > > /* > > > > > * vmemmap_populate_hugepages() allocates the memmap array in > > > > > * HPAGE_SIZE chunks. > > > > > */ > > > > > memmap_size = ALIGN(64 * npfns, HPAGE_SIZE); > > > > > offset = ALIGN(start + SZ_8K + memmap_size + dax_label_reserve, > > > > > nd_pfn->align) - start; > > > > > } > > > > > > > > > > IIUC that is finding the offset where to put vmemmap start. And that has > > > > > to be aligned to the page size with which we may end up mapping vmemmap > > > > > area right? > > > > > > Right, that's the physical offset of where the vmemmap ends, and the > > > memory to be mapped begins. > > > > > > > > Yes we find the npfns by aligning up using PAGES_PER_SECTION. But that > > > > > is to compute howmany pfns we should map for this pfn dev right? > > > > > > > > > > > > > Also i guess those 4K assumptions there is wrong? > > > > > > Yes, I think to support non-4K-PAGE_SIZE systems the 'pfn' metadata > > > needs to be revved and the PAGE_SIZE needs to be recorded in the > > > info-block. > > > > How often does a system change page-size. Is it fixed or do > > environment change it from one boot to the next? I'm thinking through > > the behavior of what do when the recorded PAGE_SIZE in the info-block > > does not match the current system page size. The simplest option is to > > just fail the device and require it to be reconfigured. Is that > > acceptable? > > The kernel page size is set at build time and as far as I know every > distro configures their ppc64(le) kernel for 64K. I've used 4K kernels > a few times in the past to debug PAGE_SIZE dependent problems, but I'd > be surprised if anyone is using 4K in production. Ah, ok. > Anyway, my view is that using 4K here isn't really a problem since > it's just the accounting unit of the pfn superblock format. The kernel > reading form it should understand that and scale it to whatever > accounting unit it wants to use internally. Currently we don't so that > should probably be fixed, but that doesn't seem to cause any real > issues. As far as I can tell the only user of npfns in > __nvdimm_setup_pfn() whih prints the "number of pfns truncated" > message. > > Am I missing something? No, I don't think so. The only time it would break is if a system with 64K page size laid down an info-block with not enough reserved capacity when the page-size is 4K (npfns too small). However, that sounds like an exceptional case which is why no problems have been reported to date.