On 7/27/20 11:04 AM, Linus Torvalds wrote:
On Mon, Jul 27, 2020 at 10:52 AM Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote:
It looks normal page is skipped too unless it is a write fault. The
comment might be a little bit misleading.
No the comment is fine - in that it matches the code.
It's the code _and_ the comment that I find to be garbage.
Read fault should just change young bit and typically TLB won't get
flushed if just young bit is changed and TLB flush can be deferred again
to write fault which may change access permission and/or dirty bit.
This is the part I disagree with.
A read fault could easily cause the exact same issue, exactly because
people do young bits in software too.
It's just harder to trigger, because the young bit is typically set
initially - in ways that the dirty bit easily isn't.
So to get to the "on, young bit wasn't set, the TLB has the 'fault on
access' bit set, *and* we raced on two different CPU's at the same
time" condition is much *much* harder than the write bit is.
Yes, it seems so. It may just trigger "fault on access" again and again
until someone else has TLB flushed.
It sounds better to do local TLB flush (this may depend on architecture,
some may need global flush) unconditionally for spurious fault except
VM_FAULT_TRIED case.
But it seems to be no different in theory.
So I think the whole "treat write/dirty specially" thing is complete
garbage. Sure, it speeds things up. But it speeds things up by being
wrong.
Linus