Re: [RFC] Add mempressure cgroup

Luiz Capitulino <lcapitulino@xxxxxxxxxx> · Fri, 30 Nov 2012 15:47:25 -0200

On Wed, 28 Nov 2012 17:27:51 -0800
Anton Vorontsov <anton.vorontsov@xxxxxxxxxx> wrote:

> On Wed, Nov 28, 2012 at 03:14:32PM -0800, Andrew Morton wrote:
> [...]
> > Compare this with the shrink_slab() shrinkers.  With these, the VM can
> > query and then control the clients.  If something goes wrong or is out
> > of balance, it's the VM's problem to solve.
> > 
> > So I'm thinking that a better design would be one which puts the kernel
> > VM in control of userspace scanning and freeing.  Presumably with a
> > query-and-control interface similar to the slab shrinkers.
> 
> Thanks for the ideas, Andrew.
> 
> Query-and-control scheme looks very attractive, and that's actually
> resembles my "balance" level idea, when userland tells the kernel how much
> reclaimable memory it has. Except the your scheme works in the reverse
> direction, i.e. the kernel becomes in charge.
> 
> But there is one, rather major issue: we're crossing kernel-userspace
> boundary. And with the scheme we'll have to cross the boundary four times:
> query / reply-available / control / reply-shrunk / (and repeat if
> necessary, every SHRINK_BATCH pages). Plus, it has to be done somewhat
> synchronously (all the four stages), and/or we have to make a "userspace
> shrinker" thread working in parallel with the normal shrinker, and here,
> I'm afraid, we'll see more strange interactions. :)

Wouldn't this be just like kswapd?

> But there is a good news: for these kind of fine-grained control we have a
> better interface, where we don't have to communicate [very often] w/ the
> kernel. These are "volatile ranges", where userland itself marks chunks of
> data as "I might need it, but I won't cry if you recycle it; but when I
> access it next time, let me know if you actually recycled it". Yes,
> userland no longer able to decide which exact page it permits to recycle,
> but we don't have use-cases when we actually care that much. And if we do,
> we'd rather introduce volatile LRUs with different priorities, or
> something alike.

I'm new to this stuff so please take this with a grain of salt, but I'm
not sure volatile ranges would be a good fit for our use case: we want to
make (kvm) guests reduce their memory when the host is getting memory
pressure.

Having a notification seems just fine for this purpose, but I'm not sure
how this would work with volatile ranges, as we'd have to mark pages volatile
in advance.

Andrew's idea seems to give a lot more freedom to apps, IMHO.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>