On Wed, Mar 13, 2019 at 8:45 PM Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> wrote: [..] > >> Now w.r.t to failures, can device-dax do an opportunistic huge page > >> usage? > > > > device-dax explicitly disclaims the ability to do opportunistic mappings. > > > >> I haven't looked at the device-dax details fully yet. Do we make the > >> assumption of the mapping page size as a format w.r.t device-dax? Is that > >> derived from nd_pfn->align value? > > > > Correct. > > > >> > >> Here is what I am working on: > >> 1) If the platform doesn't support huge page and if the device superblock > >> indicated that it was created with huge page support, we fail the device > >> init. > > > > Ok. > > > >> 2) Now if we are creating a new namespace without huge page support in > >> the platform, then we force the align details to PAGE_SIZE. In such a > >> configuration when handling dax fault even with THP enabled during > >> the build, we should not try to use hugepage. This I think we can > >> achieve by using TRANSPARENT_HUGEPAEG_DAX_FLAG. > > > > How is this dynamic property communicated to the guest? > > via device tree on powerpc. We have a device tree node indicating > supported page sizes. Ah, ok, yeah let's plumb that straight to the device-dax driver and leave out the interaction / interpretation of the thp-enabled flags. > > > > >> > >> Also even if the user decided to not use THP, by > >> echo "never" > transparent_hugepage/enabled , we should continue to map > >> dax fault using huge page on platforms that can support huge pages. > >> > >> This still doesn't cover the details of a device-dax created with > >> PAGE_SIZE align later booted with a kernel that can do hugepage dax.How > >> should we handle that? That makes me think, this should be a VMA flag > >> which got derived from device config? May be use VM_HUGEPAGE to indicate > >> if device should use a hugepage mapping or not? > > > > device-dax configured with PAGE_SIZE always gets PAGE_SIZE mappings. > > Now what will be page size used for mapping vmemmap? That's up to the architecture's vmemmap_populate() implementation. > Architectures > possibly will use PMD_SIZE mapping if supported for vmemmap. Now a > device-dax with struct page in the device will have pfn reserve area aligned > to PAGE_SIZE with the above example? We can't map that using > PMD_SIZE page size? IIUC, that's a different alignment. Currently that's handled by padding the reservation area up to a section (128MB on x86) boundary, but I'm working on patches to allow sub-section sized ranges to be mapped. Now, that said, I expect there may be bugs lurking in the implementation if PAGE_SIZE changes from one boot to the next simply because I've never tested that. I think this also indicates that the section padding logic can't be removed until all arch vmemmap_populate() implementations understand the sub-section case.