Re: [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup

Andrea Arcangeli <aarcange@xxxxxxxxxx> · Tue, 5 Jan 2021 19:02:30 -0500

On Tue, Jan 05, 2021 at 09:22:51PM +0000, Nadav Amit wrote:
> It is also about performance due to unwarranted TLB flushes.

If there will be a problem switching to the wait_flush_pending() model
suggested by Peter may not even require changes to the common code in
memory.c since I'm thinking it may not even need to take a failure
path if we plug it in the same place of the tlb flush.

So instead of the flush we could always block there until we read zero
in the atomic, then smp_rmb() and we're ready to start the copy.

So either we flush IPI if we didn't read zero, or we block until we
read zero, the difference is going to be hidden to do_wp_page. All
do_wp_page cares about is that by the time the abstract call returns,
there's no stale TLB left for such pte. If it is achieved by blocking
and waiting or flushing the TLB it doesn't matter too much.

So thinking of how bad the IPI will do, with the improved arm64 tlb
flushing code in production, we keep track of how many simultaneous mm
context there are, specifically to skip the SMP-unscalable TLBI
broadcast on arm64 like we already avoid IPIs on lazy tlbs on x86 (see
x86 tlb_is_not_lazy in native_flush_tlb_others). In other words the
IPI will materialize only if there's more than one thread running
while clear_refs run. All lazy tlbs won't get IPIs on both x86
upstream and arm64 enterprise.

This won't help multithreaded processes that compute from all CPUs at
all times but even multiple vcpu threads aren't always guaranteed to
be running at all times.

My main concern would be an IPI flood that slowdown clear_refs and
UFFDIO_WRITEPROTECT, but an incremental optimization (not required for
correctness) is to have UFFDIO_WRITEPROTECT and clear_refs switch to
lazy tlb mode before they call inc_tlb_flush_pending() and unlazy only
after dec_tlb_flush_pending. So it's possible to at least guarantee
the IPI won't slow down them down.

> In addition, as I stated before, having some clean interfaces that tell
> whether a TLB flush is needed or not would be helpful and simpler to follow.
> For instance, we can have is_pte_prot_demotion(oldprot, newprot) to figure
> out whether a TLB flush is needed in change_pte_range() and avoid
> unnecessary flushes when unprotecting pages with either mprotect() or
> userfaultfd.

When you mentioned this earlier I was thinking what happens then with
flush_tlb_fix_spurious_fault(). The fact it's safe doesn't guarantee
it's a performance win if there's a stream of spurious faults as
result. So it'd need to be checked, especially as in the case of
mprotect where the flush can be deferred and coalesced in a single IPI
at the end so there's not so much to gain from it anyway.

If you can guarantee there won't be a flood suprious wrprotect faults,
then it'll be a nice optimization.

Thanks,
Andrea