On Tue, May 03, 2022 at 04:15:46PM -0700, Suren Baghdasaryan wrote: > On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote: > > > On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote: > > > > On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: > > > > > Hello! > > > > > > > > > > Just following up from off-list discussions yesterday. > > > > > > > > > > The requirements to allocate on an RCU-protected speculative fastpath > > > > > seem to be as follows: > > > > > > > > > > 1. Never sleep. > > > > > 2. Never reclaim. > > > > > 3. Leave emergency pools alone. > > > > > > > > > > Any others? > > > > > > > > > > If those rules suffice, and if my understanding of the GFP flags is > > > > > correct (ha!!!), then the following GFP flags should cover this: > > > > > > > > > > __GFP_NOMEMALLOC | __GFP_NOWARN > > > > > > > > GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN > > > > > > Ah, good point on GFP_NOWAIT, thank you! > > > > Johannes (I think it was?) made the point to me that if we have another > > task very slowly freeing memory, a task in this path can take advantage > > of that other task's hard work and never go into reclaim. So the > > approach we should take is: Right, GFP_NOWAIT can starve out other allocations. It can clear out the freelists without the burden of having to do reclaim like everybody else wanting memory during a shortage. Including GFP_KERNEL. In smaller doses and/or for privileged purposes (e.g. single-argument kfree_rcu ;)), those allocations are fine. But because the context is page tables specifically, it would mean that userspace could trigger a large number of those and DOS other applications and the kernel. > > p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > > > if (failure) { > > rcu_read_unlock(); > > do_reclaim(); > > return FAULT_FLAG_RETRY; > > } > > > > ... but all this is now moot since the approach we agreed to yesterday > > is: > > I think the discussion was about the above approach and Johannes > suggested to fallback to the normal pagefault handling with mmap_lock > locked if PMD does not exist. Please correct me if I misunderstood > here. Yeah. Either way works, as long as the task is held accountable.