Re: [patch 01/15] mm/memory.c: avoid access flag update TLB flush for retried page fault

Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> · Mon, 27 Jul 2020 15:43:11 -0700

On Mon, Jul 27, 2020 at 11:04 AM Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote:
It looks Linus's patch has better data. It seems sane to me since
Catalin's patch still needs flush TLB in the shared domain.
Well, my patch as posted never built at all, I think.

Looking back at that patch, I used FAULT_FLAG_RETRY. But that's not
the correct name for any of the bits.

So you must have fixed it. Did you make it use "FAULT_FLAG_TRIED"?
Because that's the right bit - don't flush if this is actually the
second (or more) attempt.

Actually I didn't have access to that test machine and I didn't try to 
build your patch, Yu Xu helped me test it. I will double check with him 
once he is back online. However that data looks sane since my patch 
(skip pte update) achieved the similar result.

But I'm a bit worried that you would have used one of the other bits
(FAULT_FLAG_ALLOW_RETRY or FAULT_FLAG_RETRY_NOWAIT), and that would be
wrong. Those get set on the first attempt to say "you _may_ retry",
but they get set on the first one.

That just shows how much I tested the patch I sent out. It was
whitespace-damaged on purpose, but I still want to check.

The "FAULT_FLAG_TRIED" bit I believe is reasonable to test. That one
literally says "I've gone through this once already, don't bother with
spurious faults".  But I don't think it triggers much in practice. We
seldom actually retry faults, it needs a page that we actually start
IO on (and dropped the mmap lock for) to happen. It wouldn't happen on
the "turn existing page dirty" case, for example.

With the commit ("mm: drop mmap_sem before calling balance_dirty_pages() 
in write fault") the retried fault may happen much more frequently than 
before since it would drop mmap lock as long as dirty throttling happens.

The "FAULT_FLAG_WRITE" bit is what we test right now. I think it's
wrong. I think it is a "this happens to work" bit, and cuts down on a
lot of common cases, by simply skipping something that might be needed
but basically never is.

So I think a lot of this is dodgy. It doesn't matter on x86, and
nobody cared. Because x86 will always re-walk the page tables before
taking an architectural fault (the same way it walks them for
dirty/accessed bit updates - you could think of x86 as doing all the
things everybody else does in software, they just do in the hw walker
micro-fault logic instead).

A local TLB invalidate of a single virtual address should be basically
free. We're talking single cycles kind of free. The problem here isn't
the flush_tlb_fix_spurious_fault() itself, the problem here is that
arm64 (and pretty much everybody else who uses the default fallback)
does something horribly horribly wrong, and doesn't do the free
version.

Yes, I do agree global TLB flush seems overkilling for some architectures.

                Linus