On Wed, Oct 9, 2019 at 6:10 PM Thomas Hellström (VMware) <thomas_os@xxxxxxxxxxxx> wrote: > > Your original patch does exactly the same! Oh, no. You misread my original patch. Look again. The logic in my original patch was very different. It said that - *if* we have a pmd_entry function, then we obviously call that one. And if - after calling the pmd_entry function - we are still a hugepage, then we skip the pte_entry case entirely. And part of skipping is obviously "don't split" - but it never had that "don't split and then call the pte walker" case. - and if we *don't* have a pmd_entry function, but we do have a pte_entry function, then we always split before calling it. Notice the difference? So instead of looking at the return value of the pmd_entry() function, the approach of that suggested patch was to basically say that if the pmd_entry function wants us to go another level deeper and it was a hugepmd, it needed to split the pmd to make that happen. That's actually very different from what your patch did. My original patch never tried to walk the pte level without having one - either it *checked* that it had one, or it split. But I see where you might have misread the patch, particularly if you only looked at it as a patch, not as the end result of the patch. The end result of that patch was this: if (ops->pmd_entry) { err = ops->pmd_entry(pmd, addr, next, walk); if (err) break; /* No pte level walking? */ if (!ops->pte_entry) continue; /* No pte level at all? */ if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) continue; } else { if (!ops->pte_entry) continue; split_huge_pmd(walk->vma, pmd, addr); if (pmd_trans_unstable(pmd)) goto again; } err = walk_pte_range(pmd, addr, next, walk); and look at thew two different sides of the if-statement: if they get to "walk_pte_range()", both cases wil have verified that there actually _is_ a pte level. They will just have done it differently. - the "we didn't have a pmd function" will have split the pmd if it was a hugepmd, while the "we do have a pmd_entry" case will just check whether it's still a huge-pmd, and done a "continue" if it was and never even tried to walk the ptes. But I think the "change pmd_entry to have a sane return code" is a simpler and more flexible model, and then the pmd_entry code can just let the pte walker split the pmd if needed. So I liked that part of your patch. Linus