Re: [RFC-PATCH 1/2] mm: Add __GFP_NO_LOCKS flag

Uladzislau Rezki <urezki@xxxxxxxxx> · Mon, 17 Aug 2020 00:56:55 +0200

On Fri, Aug 14, 2020 at 11:52:06PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 14, 2020 at 01:41:40PM -0700, Paul E. McKenney wrote:
> > > And that enforces the GFP_NOLOCK allocation mode or some other solution
> > > unless you make a new rule that calling call_rcu() is forbidden while
> > > holding zone lock or any other lock which might be nested inside the
> > > GFP_NOWAIT zone::lock held region.
> > 
> > Again, you are correct.  Maybe the forecasted weekend heat will cause
> > my brain to hallucinate a better solution, but in the meantime, the
> > GFP_NOLOCK approach looks good from this end.
> 
> So I hate __GFP_NO_LOCKS for a whole number of reasons:
> 
>  - it should be called __GFP_LOCKLESS if anything
>  - it sprinkles a bunch of ugly branches around the allocator fast path
>  - it only works for order==0
> 
I had a look at your proposal, that is below. An underlying logic stays
almost the same as what has been proposed by this RFC. I mean i do not
see any difference in your approach that does exactly the same - providing
lock-less access to the per-cpu-lists. I am not talking about implementation
details and farther improvements, like doing also a search over zonelist -> ZONE_NORMAL.

Also, please note. The patch was tagged as RFC.

>
> Combined I really odn't think this should be a GFP flag. How about a
> special purpose allocation function, something like so..
> 
I agree with you. Also i think, Michal, does not like the GFP flag, it introduces
more complexity to the page allocator. So, providing lock-less access as a separate
function is better approach, IMHO.

Michal asked to provide some data regarding how many pages we need and how
"lockless allocation" behaves when it comes to success vs failed scenarios.

Please see below some results. The test case is a tight loop of 1 000 000 allocations
doing kmalloc() and kfree_rcu():

sudo ./test_vmalloc.sh run_test_mask=2048 single_cpu_test=1

<snip>
 for (i = 0; i < 1 000 000; i++) {
  p = kmalloc(sizeof(*p), GFP_KERNEL);
  if (!p)
   return -1;

  p->array[0] = 'a';
  kvfree_rcu(p, rcu);
 }
<snip>

wget ftp://vps418301.ovh.net/incoming/1000000_kmalloc_kfree_rcu_proc_percpu_pagelist_fractio_is_0.png
wget ftp://vps418301.ovh.net/incoming/1000000_kmalloc_kfree_rcu_proc_percpu_pagelist_fractio_is_8.png

Also i would like to underline, that kfree_rcu() reclaim logic can be improved further,
making the drain logic more efficient when it comes to time, thus to reduce a footprint
as a result number of required pages.

--
Vlad Rezki