On Wed, Apr 07, 2021 at 04:47:34PM +0200, Peter Zijlstra wrote: > On Tue, Apr 06, 2021 at 06:44:34PM -0700, Michel Lespinasse wrote: > > The counter's write side is hooked into the existing mmap locking API: > > mmap_write_lock() increments the counter to the next (odd) value, and > > mmap_write_unlock() increments it again to the next (even) value. > > > > The counter's speculative read side is supposed to be used as follows: > > > > seq = mmap_seq_read_start(mm); > > if (seq & 1) > > goto fail; > > .... speculative handling here .... > > if (!mmap_seq_read_check(mm, seq) > > goto fail; > > > > This API guarantees that, if none of the "fail" tests abort > > speculative execution, the speculative code section did not run > > concurrently with any mmap writer. > > So this is obviously safe, but it's also super excessive. Any change, > anywhere, will invalidate and abort a SPF. > > Since you make a complete copy of the vma, you could memcmp it in its > entirety instead of this. Yeah, there is a deliberate choice here to start with the simplest possible approach, but this could lead to more SPF aborts than strictly necessary. It's not clear to me that just comparing original vs current vma attributes would always be sufficient - I think in some cases, we may also need to worry about attributes being changed back and forth concurrently with the fault. However, the comparison you suggest would be safe at least in the case where the original pte was pte_none. At some point, I did try implementing the optimization you suggest - if the global counter has changed, re-fetch the current vma and compare it against the old - but this was basically no help for the (Android) workloads I looked at. There were just too few aborts that were caused by the global counter in the first place. Note that my patchset only handles anon and page cache faults speculatively, so generally there wasn't very much time for the counter to change. But yes, this may not hold across all workloads, and maybe we'd want to do something smarter once/if a problem workload is identified.