On 5/4/22 18:23, Johannes Weiner wrote: > On Tue, May 03, 2022 at 04:15:46PM -0700, Suren Baghdasaryan wrote: >> On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: >>> >>> On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote: >>>> On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote: >>>>> On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: >>>>>> Hello! >>>>>> >>>>>> Just following up from off-list discussions yesterday. >>>>>> >>>>>> The requirements to allocate on an RCU-protected speculative fastpath >>>>>> seem to be as follows: >>>>>> >>>>>> 1. Never sleep. >>>>>> 2. Never reclaim. >>>>>> 3. Leave emergency pools alone. >>>>>> >>>>>> Any others? >>>>>> >>>>>> If those rules suffice, and if my understanding of the GFP flags is >>>>>> correct (ha!!!), then the following GFP flags should cover this: >>>>>> >>>>>> __GFP_NOMEMALLOC | __GFP_NOWARN >>>>> >>>>> GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN >>>> >>>> Ah, good point on GFP_NOWAIT, thank you! >>> >>> Johannes (I think it was?) made the point to me that if we have another >>> task very slowly freeing memory, a task in this path can take advantage >>> of that other task's hard work and never go into reclaim. So the >>> approach we should take is: > > Right, GFP_NOWAIT can starve out other allocations. It can clear out > the freelists without the burden of having to do reclaim like > everybody else wanting memory during a shortage. Including GFP_KERNEL. FTR, I wonder if this is really true, given the suggested fallback. With GFP_NOWAIT, you can either see memory (in all applicable zones) as a) above low_watermark, just go ahead and allocate, as GFP_KERNEL would b) between min and low watermark, wake up kswapd and allocate, as GFP_KERNEL would c) below min watermark, the most interesting. GFP_KERNEL fallbacks to reclaim. If the GFP_NOWAIT path's fallback also includes reclaim, as suggested in this thread, how is it really different from GFP_KERNEL? So am I missing something or is GFP_NOWAIT fastpath with an immediate fallback that includes reclaim (and not just a retry loop) fundamentally not different from GFP_KERNEL, regardless of how often we attempt it? > In smaller doses and/or for privileged purposes (e.g. single-argument > kfree_rcu ;)), those allocations are fine. But because the context is > page tables specifically, it would mean that userspace could trigger a > large number of those and DOS other applications and the kernel. > >>> p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); >>> pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); >>> pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); >>> >>> if (failure) { >>> rcu_read_unlock(); >>> do_reclaim(); >>> return FAULT_FLAG_RETRY; >>> } >>> >>> ... but all this is now moot since the approach we agreed to yesterday >>> is: >> >> I think the discussion was about the above approach and Johannes >> suggested to fallback to the normal pagefault handling with mmap_lock >> locked if PMD does not exist. Please correct me if I misunderstood >> here. > > Yeah. Either way works, as long as the task is held accountable. >