Re: [PATCH v6 07/11] mm/mremap: Use range flush that does TLB and page walk cache flush

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 25 May 2021 07:08:18 -1000

On Tue, May 25, 2021 at 3:28 AM Aneesh Kumar K.V
<aneesh.kumar@xxxxxxxxxxxxx> wrote:
>
> How about flush_tlb_and_page_table_cache() ?

Honestly, I'd prefer it to be a separate function.

So keep the existing

     flush_tlb()

as-is, and add a

        flush_tlb_walking_cache()

and document that any architecture that flushes the walker caches as
part of the regular tlb flush can just make that a no-op.

Would that work for powerpc?

But:

> >  (b) is this even worth it as a public interface?
>
> But such a large range invalidate doesn't imply we are freeing page
> tables.

No. But it's what everybody else (ie x86 and ARM) does, and if you're
flushing megabytes of TLB's, what's the downside of flushing a few TLB
walker cache entries?

You already do that for internal powerpc errata anyway (ie
"mm_needs_flush_escalation()"), so I'm saying that you might as well
treat the page walker cache as a powerpc-internal implementation
thing.

Put another way: can you even _measure_ the difference between "just
make powerpc look like everybody else" and "add a new explicit page
table walker cache flush function interface"?

Now, from a quick look at the powerpc code, it looks like powerpc is a
bit mis-architected, and when you flush the walker cache, you flush
everything for that ASID. x86 and arm only flush the parts affected by
the TLB flush range (now, admittedly, that's what they do
_architecturally_ - for all I know the actual hardware just always
flushes all walker caches when you flush any TLB and so maybe they act
exactly like the powrpc RIC_FLUSH_PWC in practice).

So maybe it's measurable. But I kind of doubt it, and I'd like to know
that you've actually done some testing to see that "yes, this matters,
I can't just say 'if flushing more than a pmd, just flush the walker
cache too'".

Hmm?

                Linus