On Fri, Jan 28, 2022 at 05:09:41AM -0800, Michel Lespinasse wrote: > The counter's write side is hooked into the existing mmap locking API: > mmap_write_lock() increments the counter to the next (odd) value, and > mmap_write_unlock() increments it again to the next (even) value. > > The counter's speculative read side is supposed to be used as follows: > > seq = mmap_seq_read_start(mm); > if (seq & 1) > goto fail; > .... speculative handling here .... > if (!mmap_seq_read_check(mm, seq) > goto fail; > > This API guarantees that, if none of the "fail" tests abort > speculative execution, the speculative code section did not run > concurrently with any mmap writer. > > This is very similar to a seqlock, but both the writer and speculative > readers are allowed to block. In the fail case, the speculative reader > does not spin on the sequence counter; instead it should fall back to > a different mechanism such as grabbing the mmap lock read side. > > Signed-off-by: Michel Lespinasse <michel@xxxxxxxxxxxxxx> > --- > include/linux/mm_types.h | 4 +++ > include/linux/mmap_lock.h | 58 +++++++++++++++++++++++++++++++++++++-- > 2 files changed, 60 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 0ae3bf854aad..e4965a6f34f2 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -523,6 +523,10 @@ struct mm_struct { > * cacheline. > */ > struct rw_semaphore mmap_lock; > +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT > + unsigned long mmap_seq; > +#endif > + > The previous version of patches [1] had maintained this sequence counter per-vma which is more granualar. Can you please share the rationale behind this? I guess, this is more maintainable as we did not scatter the write side changes but nicely hooked into the mmap write lock API. I have tested ebizzy test with per-mm and per-vma sequence counter on x86 QEMU and aarch64 platforms. The results indicate that we are taking classic page fault route 5% more with per-mm sequence counter but it did not showed up in the end results (how much time it takes to do fixed number of operations). So I am asking this only to understand the reasoning behind this change. [1] https://lore.kernel.org/lkml/1523975611-15978-9-git-send-email-ldufour@xxxxxxxxxxxxxxxxxx/