Re: [RFC 0/3] soft reclaim rework

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






On Tue, Apr 9, 2013 at 5:13 AM, Michal Hocko <mhocko@xxxxxxx> wrote:
Hi all,
It's been a long when I promised my take on the $subject but I got
permanently preempted by other tasks. I finally got it, fortunately.

Hi Michal,

This is on my list for a while and never get chance to get to it.  The per-memcg softlimit reclaim is one of the key feature google uses today, and thank you for putting the effort of move this forward. 
 
I haven't read the patch in details, but since we chatted about this for few iterations and it should just look familiar. 


This is just a first attempt. There are still some todos but I wanted to
post it soon to get a feedback.

The basic idea is quite simple. Pull soft reclaim into shrink_zone in
the first step and get rid of the previous soft reclaim infrastructure.
shrink_zone is done in two passes now. First it tries to do the soft
limit reclaim and it falls back to reclaim-all-mode if no group is over
the limit or no pages have been scanned. The second pass happens at the
same priority so the only time we waste is the memcg tree walk which
shouldn't be a big deal. There is certainly room for improvements in
that direction. But let's keep it simple for now.
As a bonus we will get rid of a _lot_ of code by this and soft reclaim
will not stand out like before.
 
Yes, that is the part that should have given us enough motivation to merge this effort long time ago. However, we had difficulties of agreeing the 5% of the code (mainly on the softlimit policy) which preventing to cleaning up 95% of the code. I take the blame.

The second step is somehow more controversial. I am redefining meaning
of the default soft limit value. I've not chosen 0 as we discussed
previously because I want to preserve hierarchical property of the soft
limit (if a parent up the hierarchy is over its limit then children are
over as well)

This is the 5% we keep disagreeing each other. The internal patch I am carrying has different interpretation of "hierarchical softlimit reclaim". 

However, I am more incline to accept that difference this time. At least that will get us moving forward to clean up the code first. Then we can revisit the exact policy of that 5% if that doesn't fit for other usecase ( besides google). I am happy to backport this part into our kernel later and then only carry that 5% of change internally.

To give more background of what I mean by different interpretation of "hierarchical", I have some write up some time back which is attached in this thread. This is purely to make a note for later, and as I mentioned I will go ahead review the patch and forget about that difference at this step.

so I have kept the default untouched - unlimited - but I
have slightly changed the meaning of this value. I interpret it as "user
doesn't care about soft limit". More precisely the value is ignored
unless it has been specified by user so such groups are eligible for
soft reclaim even though they do not reach the limit. Such groups
do not force their children to be reclaimed of course.
 
 
I guess the only possible use case where this wouldn't work as
expected is when somebody creates a group and set its soft limit to
a small value (e.g. 0) just to protect all other groups from being
reclaimed. With a new scheme all groups would be reclaimed while the
previous implementation could end up reclaiming only the "special"
group. This configuration can be achieved by the new scheme trivially
so I think we should be safe. Or does this sound like a big problem?
Finally the third step is soft limit reclaim integration into targeted
reclaim. The patch is trivial one liner.

Will go through the patches with details in next day or so.

Thanks

--Ying

I haven't get to test it properly yet. I've tested only 2 workloads:
1) 1GB RAM + 128MB swap in a kvm (host 4 GB RAM)
   - 2 memcgs (directly under root)
        - A has soft limit 500MB and hard unlimited
        - B both hard and soft unlimited (default values)
   - One dd if=/dev/zero of=storage/$file bs=1024 count=1228800 per group
2) same setup
   - tar -xf linux source tree + make -j2 vmlinux

Results
1) I've checked memory.usage_in_bytes
Base (-mm tree)
        Group A         Group B
median  446498816       448659456

Patches applied
median  524314624       377921536

So as expected, A got more room on behalf of B and it is nicely over its
soft limit. I wanted to compare the reclaim performance as well but we
do not account scanned and reclaimed pages during the old soft reclaim
(global_reclaim prevents that). But I am planning to look at it.
Anyway it doesn't look like we are scanning/reclaiming more with the
patched kernel:
Base:    pgscan_kswapd_dma32 394382     pgsteal_kswapd_dma32 394372
Patched: pgscan_kswapd_dma32 394501     pgsteal_kswapd_dma32 394491

So I would assume that the soft limit reclaim scanned more in the end.

Total runtime was slightly smaller for the patch version:
Base
                Group A         Group B
total time      480.087 s       480.067 s

Patches applied
total time      474.853 s       474.736 s

But this could be an artifacts of the guest scheduling or related to the
host activity so I wouldn't draw any conclusions from here.

2) kbuild test showed more or less the same results
usage_in_bytes
Base
                Group A         Group B
Median          394817536       395634688

Patches applied
median          483481600       302131200

A is kept closer to the soft limit again. There is some fluctuation
around the limit because kbuild creates a lot of short lived processes.
Base:    pgscan_kswapd_dma32 1648718    pgsteal_kswapd_dma32 1510749
Patched: pgscan_kswapd_dma32 2042065    pgsteal_kswapd_dma32 1667745

The differences are much bigger now so it would be interesting how much
has been scanned/reclaimed during soft reclaim in the base kernel.

I haven't included total runtime statistics here because they seemed
even more random due to guest/host interaction.

Any comments are welcome, of course.

Michal Hocko (3):
      memcg: integrate soft reclaim tighter with zone shrinking code
      memcg: Ignore soft limit until it is explicitly specified
      vmscan, memcg: Do softlimit reclaim also for targeted reclaim

Incomplete diffstat (without node-zone soft limit tree removal etc...)
so more deletions to come.
 include/linux/memcontrol.h |   10 +--
 mm/memcontrol.c            |  175 +++++++++-----------------------------------
 mm/vmscan.c                |   67 ++++++++++-------
 3 files changed, 78 insertions(+), 174 deletions(-)

Attachment: SoftlimitReclaimInMemcg.pdf
Description: Adobe PDF document


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]