Re: [PATCH -v2 -mm] add extra free kbytes tunable

David Rientjes <rientjes@xxxxxxxxxx> · Wed, 12 Oct 2011 22:22:33 -0700 (PDT)

On Thu, 13 Oct 2011, Rik van Riel wrote:

> > I suggested a patch from BFS that would raise kswapd to the same priority
> > of the task that triggered it (not completely up to rt, but the highest
> > possible in that case) and I'm waiting to hear if that helps for Satoru's
> > test case before looking at alternatives.  We could also extend the patch
> > to raise the priority of an already running kswapd if a higher priority
> > task calls into the page allocator's slowpath.
> 
> This has the distinct benefit of making kswapd most active right
> at the same time the application is most active, which returns
> us to your first objection to the extra free kbytes patch (apps
> will suffer from kswapd cpu use).
> 

Not necessarily, it only raises the priority of kswapd to be the same as 
the application, although it'll never raise it to be realtime, that kicks 
it in the page allocator's slowpath.  If the application has a nice level 
of 0, it's a no-op.  That's very different from extra_free_kbytes which 
causes kswapd to do extra work regardless of the priority of the 
application that is allocating memory.  Raising the priority of kswapd for 
rt threads makes sense if they are going to deplete all memory, it makes 
no sense to allow a rt thread to allocate tons of memory and not even give 
kswapd a chance to compete.

> Furthermore, I am not sure that giving kswapd more CPU time is
> going to help, because kswapd could be stuck on some lock, held
> by a lower priority (or sleeping) context.
> 
> I agree that the BFS patch would be worth a try, and would be
> very pleasantly surprised if it worked, but I am not very
> optimistic about it...
> 

It may require a combination of Con's patch, increasing the priority of 
kswapd if a higher priority task kicks it in the page allocator, and an 
extra bonus on top of the high watermark if it was triggered by a 
rt-thread -- similar to ALLOC_HARDER but instead reclaiming to 
(high * 1.25).

If we're going to go with extra_free_kbytes, then I'd like to see the test 
case posted with a mathematical formula to show me what I should tune it 
to be depending on my machine's memory capacity and amount of free RAM 
when started (and I can use mem= to test it for various capacities).  For 
this to be merged, there should be a clear expression that shows what the 
ideal setting of the tunable should be rather than asking for trial-and-
error to see what works and what doesn't.  If such an expression doesn't 
exist, then it's clear that the necessary setting will vary significantly 
as the implementation changes from kernel to kernel.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>