On 28.07.23 22:50, Linus Torvalds wrote:
On Fri, 28 Jul 2023 at 13:33, David Hildenbrand <david@xxxxxxxxxx> wrote:
So would you rather favor a FOLL_NUMA that has to be passed from the
outside by selected callers or a FOLL_NUMA that is set on the GUP path
unconditionally (but left clear for follow_page())?
I'd rather see the FOLL_NUMA that has to be set by odd cases, and that
is never set by any sane user.
Thanks!
And it should not be called FOLL_NUMA. It should be called something
else. Because *not* having it doesn't disable following pages across
NUMA boundaries, and the name is actively misleading.
It sounds like what KVM actually wants is a "Do NOT follow NUMA pages,
I'll force a page fault".
And the fact that KVM wants a fault for NUMA pages shouldn't mean that
others - who clearly cannot care - get that insane behavior by
default.
For KVM it represents actual CPU access. To map these pages into the VM
MMU we have to look them up from the process -- in the context of the
faulting CPU. So it makes a lot of sense for KVM. (which is also where
autonuma gets heavily used)
The name should reflect that, instead of being the misleading mess of
FOLL_FORCE and bad naming that it is now.
So maybe it can be called "FOLL_HONOR_NUMA_FAULT" or something, to
make it clear that it's the *opposite* of FOLL_FORCE, and that it
honors the NUMA faulting that nobody should care about.
Naming sounds much better to me.
Then the KVM code can have a big comment about *why* it sets that bit.
Yes.
Hmm? Can we please aim for something that is understandable and
documented? No odd implicit rules. No "force NUMA fault even when it
makes no sense". No tie-in with FOLL_FORCE.
I mean, I messed all that FOLL_NUMA handling up because I was very
confused. So I'm all for better documentation.
Can we get a simple revert in first (without that FOLL_FORCE special
casing and ideally with a better name) to handle stable backports, and
I'll follow-up with more documentation and letting GUP callers pass in
that flag instead?
That would help a lot. Then we also have more time to let that "move it
to GUP callers" mature a bit in -next, to see if we find any surprises?
--
Cheers,
David / dhildenb