Re: [PATCH v7 00/11] Speedup mremap on ppc64

Nicholas Piggin <npiggin@xxxxxxxxx> · Wed, 16 Jun 2021 11:44:39 +1000

Excerpts from Linus Torvalds's message of June 9, 2021 3:10 am:
> On Mon, Jun 7, 2021 at 3:10 AM Nick Piggin <npiggin@xxxxxxxxx> wrote:
>>
>> I'd really rather not do this, I'm not sure if micro benchmark captures everything.
> 
> I don't much care what powerpc code does _itnernally_ for this
> architecture-specific mis-design issue, but I really don't want to see
> more complex generic interfaces unless you have better hard numbers
> for them.
> 
> So far the numbers are: "no observable difference".
> 
> It would have to be not just observable, but actually meaningful for
> me to go "ok, we'll add this crazy flag that nobody else cares about".

Fair enough, will have to try get more numbers then I suppose.

> 
> And honestly, from everything I've seen on page table walker caches:
> they are great, but once you start remapping big ranges and
> invallidating megabytes of TLB's, the walker caches just aren't going
> to be your issue.

Remapping big ranges is going to have to invalidate intermediate caches
(aka PWC), so is unmapping. So we're stuck with the big hammer PWC 
invalidate there anyway.

It's mprotect and friends that would care here, possibly some THP thing...
but I guess those are probably down the list a little way.

I'm a bit less concerned about the PWCs that might be caching the regions
of the big mprotect() we just did, and more concerned about the effect 
of flushing all unrelated caches. Including on all other CPUs a threaded
program is running on. HANA, Java, are threaded and do mremaps, 
unfortunately.

> 
> But: numbers talk.  I'd take the sane generic interfaces as a first
> cut. If somebody then has really compelling numbers, we can _then_
> look at that "optimize for odd page table walker cache situation"
> case.

Yep okay. It's not the end of the world (or if it is we'd be able to get
numbers presumably).

> And in the meantime, maybe you can talk to the hardware people and
> tell them that you want the "flush range" capability to work right,
> and that if the walker cache is <i>so</i> important they shouldn't
> have made it a all-or-nothing flush.

I have, more than once :(

Fixing that would fix munmap etc cases as well, so yeah.

Thanks,
Nick