On Thu, Feb 28, 2019 at 1:40 AM Oliver <oohall@xxxxxxxxx> wrote: > > On Thu, Feb 28, 2019 at 7:35 PM Aneesh Kumar K.V > <aneesh.kumar@xxxxxxxxxxxxx> wrote: > > > > Add a flag to indicate the ability to do huge page dax mapping. On architecture > > like ppc64, the hypervisor can disable huge page support in the guest. In > > such a case, we should not enable huge page dax mapping. This patch adds > > a flag which the architecture code will update to indicate huge page > > dax mapping support. > > *groan* > > > Architectures mostly do transparent_hugepage_flag = 0; if they can't > > do hugepages. That also takes care of disabling dax hugepage mapping > > with this change. > > > > Without this patch we get the below error with kvm on ppc64. > > > > [ 118.849975] lpar: Failed hash pte insert with error -4 > > > > NOTE: The patch also use > > > > echo never > /sys/kernel/mm/transparent_hugepage/enabled > > to disable dax huge page mapping. > > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> > > --- > > TODO: > > * Add Fixes: tag > > > > include/linux/huge_mm.h | 4 +++- > > mm/huge_memory.c | 4 ++++ > > 2 files changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index 381e872bfde0..01ad5258545e 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -53,6 +53,7 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr, > > pud_t *pud, pfn_t pfn, bool write); > > enum transparent_hugepage_flag { > > TRANSPARENT_HUGEPAGE_FLAG, > > + TRANSPARENT_HUGEPAGE_DAX_FLAG, > > TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, > > TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, > > TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, > > @@ -111,7 +112,8 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) > > if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG)) > > return true; > > > > - if (vma_is_dax(vma)) > > + if (vma_is_dax(vma) && > > + (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_DAX_FLAG))) > > return true; > > Forcing PTE sized faults should be fine for fsdax, but it'll break > devdax. The devdax driver requires the fault size be >= the namespace > alignment since devdax tries to guarantee hugepage mappings will be > used and PMD alignment is the default. We can probably have devdax > fall back to the largest size the hypervisor has made available, but > it does run contrary to the design. Ah well, I suppose it's better off > being degraded rather than unusable. Given this is an explicit setting I think device-dax should explicitly fail to enable in the presence of this flag to preserve the application visible behavior. I.e. if device-dax was enabled after this setting was made then I think future faults should fail as well.