On Mon, Aug 21, 2023 at 8:09 AM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote: > > On Fri, Aug 18, 2023 at 2:21 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > > > On Thu, Aug 17, 2023 at 11:29 AM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote: > > > > > > On Thu, Aug 17, 2023 at 10:47 AM Yang Shi <shy828301@xxxxxxxxx> wrote: > > > > > > > > On Wed, Aug 16, 2023 at 2:48 PM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote: > > > > > > > > > > > We have a out of tree driver that maps huge pages through a file handle and > > > > > > relies on -> huge_fault. It used to work in 5.19 kernels but 6.1 changed this > > > > > > behaviour. > > > > > > > > > > > > I don’t think reverting the earlier behaviour of fault_path for huge pages should > > > > > > impact kernel negatively. > > > > > > > > > > > > Do you think we can restore this earlier behaviour of kernel to allow page fault > > > > > > for huge pages via ->huge_fault. > > > > > > > > > > That seems reasonable to me. I think using the existence of a > > > > > ->huge_fault() handler as a predicate to return "true" makes sense to > > > > > me. The "normal" flow for file-backed memory along fault path still > > > > > needs to return "false", so that we correctly fallback to ->fault() > > > > > handler. Unless there are objections, I can do that in a v2. > > > > > > > > Sorry for chiming in late. I'm just back from vacation and trying to catch up... > > > > > > > > IIUC the out-of-tree driver tries to allocate huge page and install > > > > PMD mapping via huge_fault() handler, but the cleanup of > > > > hugepage_vma_check() prevents this due to the check to > > > > VM_NO_KHUGEPAGED? > > > > > > > > So you would like to check whether a huge_fault() handler existed > > > > instead of vma_is_dax()? > > > > > > Sorry for the multiple threads here. There are two problems: (a) the > > > VM_NO_KHUGEPAGED check along fault path, and (b) we don't give > > > ->huge_fault() a fair shake, if it exists, along fault path. The > > > current code assumes vma_is_dax() iff ->huge_fault() exists. > > > > > > (a) is easy enough to fix. For (b), I'm currently looking at the > > > possibility of not worrying about ->huge_fault() in > > > hugepage_vma_check(), and just letting create_huge_pud() / > > > create_huge_pmd() check and fallback as necessary. I think we'll need > > > the explicit DAX check still, since we want to keep khugepaged and > > > MADV_COLLAPSE away, and the presence / absence of ->huge_fault() isn't > > > enough to know that (well.. today it kind of is, but we shouldn't > > > depend on it). > > > > You meant something like: > > > > if (vma->vm_ops->huge_fault) { > > if (vma_is_dax(vma)) > > return in_pf; > > > > /Fall through */ > > } > > I don't think this will work for Saurabh's case, since IIUC, they > aren't using dax, but are using VM_HUGEPAGE|VM_MIXEDMAP, faulted in > using ->huge_fault() > > The old (v5.19) fault path looked like: > > static inline bool transhuge_vma_enabled(struct vm_area_struct *vma, > unsigned long vm_flags) > { > /* Explicitly disabled through madvise. */ > if ((vm_flags & VM_NOHUGEPAGE) || > test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) > return false; > return true; > } > > /* > * to be used on vmas which are known to support THP. > * Use transparent_hugepage_active otherwise > */ > static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) > { > > /* > * If the hardware/firmware marked hugepage support disabled. > */ > if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX)) > return false; > > if (!transhuge_vma_enabled(vma, vma->vm_flags)) > return false; > > if (vma_is_temporary_stack(vma)) > return false; > > if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG)) > return true; > > if (vma_is_dax(vma)) > return true; > > if (transparent_hugepage_flags & > (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)) > return !!(vma->vm_flags & VM_HUGEPAGE); > > return false; > } > > For non-anonymous, the next check (in create_huge_*) would be for that > ->huge_fault handler, falling back as necessary if it didn't exist. Yeah, you are right. I just replied to your v2 patch. > > The patch I sent out last week[1] somewhat restores this logic -- the > only difference being we do the check for ->huge_fault in > hugepage_vma_check() as well. This is so smaps can surface this > possibility with some accuracy. I just realized it will erroneously > return "true" for the collapse path, however.. > > Maybe Matthew was right about unifying everything here :P That's 2 > mistakes I've made in trying to fix this issue (but maybe that's just > me). IMHO, no rush on fixing it. > > [1] https://lore.kernel.org/linux-mm/20230818211533.2523697-1-zokeefe@xxxxxxxxxx/