On Wed, May 04, 2022 at 01:45:11AM +0200, Michal Hocko wrote: > On Tue 03-05-22 16:15:46, Suren Baghdasaryan wrote: > > On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > [...] > > > rcu_read_lock(); > > > vma = vma_lookup(); > > > if (down_read_trylock(&vma->sem)) { > > > rcu_read_unlock(); > > > } else { > > > rcu_read_unlock(); > > > mmap_read_lock(mm); > > > vma = vma_lookup(); > > > down_read(&vma->sem); > > > } > > > > > > ... and we then execute the page table allocation under the protection of > > > the vma->sem. > > > > > > At least, that's what I think we agreed to yesterday. > > > > Honestly, I don't remember discussing vma->sem at all. > > This is the rangelocking approach that is effectivelly per-VMA. So that > should help with the most simplistic case where the mmap contention is > not on the same VMAs which should be the most common case (e.g. faulting > from several threads while there is mmap happening in the background). > > There are cases where this could be too coarse of course and RCU would > be a long term plan. The above seems easy enough and still probably good > enough for most cases so a good first step. It also fixes the low-pri monitoring daemon problem as page faults will not be blocked by a writer (unless the read_trylock fails). I see three potential outcomes here from the vma rwsem approach: - No particular improvement on any workloads. Result: we try something else. - Minor gains (5-10%). We benchmark it and discover there's still significant contention on the vma_sem. Result: we take those wins and keep going towards a full RCU solution - Major gains (20-50%). Result: We're done, break out the champagne.