On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote: > > On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote: > > > On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: > > > > Hello! > > > > > > > > Just following up from off-list discussions yesterday. > > > > > > > > The requirements to allocate on an RCU-protected speculative fastpath > > > > seem to be as follows: > > > > > > > > 1. Never sleep. > > > > 2. Never reclaim. > > > > 3. Leave emergency pools alone. > > > > > > > > Any others? > > > > > > > > If those rules suffice, and if my understanding of the GFP flags is > > > > correct (ha!!!), then the following GFP flags should cover this: > > > > > > > > __GFP_NOMEMALLOC | __GFP_NOWARN > > > > > > GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN > > > > Ah, good point on GFP_NOWAIT, thank you! > > Johannes (I think it was?) made the point to me that if we have another > task very slowly freeing memory, a task in this path can take advantage > of that other task's hard work and never go into reclaim. So the > approach we should take is: > > p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > if (failure) { > rcu_read_unlock(); > do_reclaim(); > return FAULT_FLAG_RETRY; > } > > ... but all this is now moot since the approach we agreed to yesterday > is: I think the discussion was about the above approach and Johannes suggested to fallback to the normal pagefault handling with mmap_lock locked if PMD does not exist. Please correct me if I misunderstood here. > > rcu_read_lock(); > vma = vma_lookup(); > if (down_read_trylock(&vma->sem)) { > rcu_read_unlock(); > } else { > rcu_read_unlock(); > mmap_read_lock(mm); > vma = vma_lookup(); > down_read(&vma->sem); > } > > ... and we then execute the page table allocation under the protection of > the vma->sem. > > At least, that's what I think we agreed to yesterday. Honestly, I don't remember discussing vma->sem at all. My understanding is that two approaches were differing by the section covered by rcu_read_lock/rcu_read_unlock. The solution that you suggested would handle pagefault completely under RCU as long as it's possible and would fallback to the mmap_lock if it's impossible, while Michel's implementation was taking rcu_read_lock/rcu_read_unlock for smaller sections and would use vma->seq_number to detect any vma changes between these sections. Your suggested approach sounds simpler and the way I understood the comments is that we should give it a try. Did I miss anything? Thanks, Suren. >