Re: [RFC] Add mempressure cgroup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 28, 2012 at 03:14:32PM -0800, Andrew Morton wrote:
[...]
> Compare this with the shrink_slab() shrinkers.  With these, the VM can
> query and then control the clients.  If something goes wrong or is out
> of balance, it's the VM's problem to solve.
> 
> So I'm thinking that a better design would be one which puts the kernel
> VM in control of userspace scanning and freeing.  Presumably with a
> query-and-control interface similar to the slab shrinkers.

Thanks for the ideas, Andrew.

Query-and-control scheme looks very attractive, and that's actually
resembles my "balance" level idea, when userland tells the kernel how much
reclaimable memory it has. Except the your scheme works in the reverse
direction, i.e. the kernel becomes in charge.

But there is one, rather major issue: we're crossing kernel-userspace
boundary. And with the scheme we'll have to cross the boundary four times:
query / reply-available / control / reply-shrunk / (and repeat if
necessary, every SHRINK_BATCH pages). Plus, it has to be done somewhat
synchronously (all the four stages), and/or we have to make a "userspace
shrinker" thread working in parallel with the normal shrinker, and here,
I'm afraid, we'll see more strange interactions. :)

But there is a good news: for these kind of fine-grained control we have a
better interface, where we don't have to communicate [very often] w/ the
kernel. These are "volatile ranges", where userland itself marks chunks of
data as "I might need it, but I won't cry if you recycle it; but when I
access it next time, let me know if you actually recycled it". Yes,
userland no longer able to decide which exact page it permits to recycle,
but we don't have use-cases when we actually care that much. And if we do,
we'd rather introduce volatile LRUs with different priorities, or
something alike.

So, we really don't need the full-fledged userland shrinker, since we can
just let the in-kernel shrinker do its job. If we work with the
bytes/pages granularity it is just easier (and more efficient in terms of
communication) to do the volatile ranges.

For the pressure notifications use-cases, we don't even know bytes/pages
information: "activity managers" are separate processes looking after
overall system performance.

So, we're not trying to make userland too smart, quite the contrary: we
realized that for this interface we don't want to mess with the bytes and
pages, and that's why we cut this stuff down to only three levels. Before
this, we were actually trying to count bytes, we did not like it and we
ran away screaming.

OTOH, your scheme makes volatile ranges unneeded, since a thread might
register a shrinker hook and free stuff by itself. But again, I believe
this involves more communication with the kernel.

Thanks,
Anton.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]