On 10/10/19 4:07 AM, Linus Torvalds wrote:
On Wed, Oct 9, 2019 at 6:10 PM Thomas Hellström (VMware)
<thomas_os@xxxxxxxxxxxx> wrote:
Your original patch does exactly the same!
Oh, no. You misread my original patch.
Look again.
The logic in my original patch was very different. It said that
- *if* we have a pmd_entry function, then we obviously call that one.
And if - after calling the pmd_entry function - we are still a
hugepage, then we skip the pte_entry case entirely.
And part of skipping is obviously "don't split" - but it never had
that "don't split and then call the pte walker" case.
- and if we *don't* have a pmd_entry function, but we do have a
pte_entry function, then we always split before calling it.
Notice the difference?
From what I can tell, my patch is doing the same. At least that always
was the intention. To determine whether to skip pte and skip split, your
patch uses
/* No pte level at all? */
if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd)
|| pmd_devmap(*pmd))
continue;
whereas my patch does
if (pmd_trans_unstable(pmd))
goto again;
/* Fall through */
which is the same (pmd_trans_unstable() is the same test as you do, but
not racy). Yes, it's missing the test for pmd_devmap() but I think
that's an mm bug been discussed elsewhere, and we also rerun because a
huge / none pmd at this (FALLBACK) point is probably a race and unintended.
But I think the "change pmd_entry to have a sane return code" is a
simpler and more flexible model, and then the pmd_entry code can just
let the pte walker split the pmd if needed.
OK, let's aim for that then.
Thanks,
Thomas
So I liked that part of your patch.
Linus