[PATCH 0/5] memcg: per cgroup background reclaim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The current implementation of memcg only supports direct reclaim and this
patchset adds the support for background reclaim. Per cgroup background
reclaim is needed which spreads out the memory pressure over longer period
of time and smoothes out the system performance.

I run through some simple tests which reads/writes a large file and makes sure
it triggers per cgroup kswapd on the low_wmark. I compared at pg_steal/pg_scan
ratio w/o background reclaim. Also the running time is measured in this
patchset.

Step1: Create a cgroup with 500M memory_limit.
$ mount -t cgroup -o cpuset,memory cpuset /dev/cgroup
$ mkdir /dev/cgroup/A
$ echo 0 >/dev/cgroup/A/cpuset.cpus
$ echo 0 >/dev/cgroup/A/cpuset.mems
$ echo 500m >/dev/cgroup/A/memory.limit_in_bytes
$ echo $$ >/dev/cgroup/A/tasks

Step2: Check the wmarks.
$ cat /dev/cgroup/A/memory.reclaim_wmarks
low_wmark 3663360
high_wmark 4396032

Step3: Dirty the pages by creating a 20g file on hard drive.
$ ddtest -D /export/hdc3/dd -b 1024 -n 20971520 -t 1

Checked the memory.stat w/o background reclaim. It used to be all the pages are
reclaimed from direct reclaim, and now most of the pages  are reclaimed at
background. (note: writing '0' to min_free_kbytes disables per cgroup kswapd)

Only direct reclaim                       With background reclaim:
pgpgin 5184374                            pgpgin 5248437
pgpgout 5056385                           pgpgout 5121659
kswapd_steal 0                            kswapd_steal 5121516
pg_pgsteal 5056363                        pg_pgsteal 32
kswapd_pgscan 0                           kswapd_pgscan 5121569
pg_scan 5056416                           pg_scan 32
pgrefill 297632                           pgrefill 312512
pgoutrun 0                                pgoutrun 107525
allocstall 158009                         allocstall 1

real 21m6.864s                            real 24m56.735s
user 0m2.047s                             user 0m2.331s
sys 6m2.572s                              sys  7m29.048s

Step4: Cleanup
$ echo $$ >/dev/cgroup/tasks
$ echo 1 > /dev/cgroup/A/memory.force_empty

Step5: Read the 20g file into the pagecache.
$ cat /export/hdc3/dd/tf0 > /dev/zero;

Checked the memory.stat w/o background reclaim. Most of clean pages are
reclaimed at background instead of direct reclaim.

Only direct reclaim                       With background reclaim:
pgpgin 5184066                            pgpgin 5184081
pgpgout 5056093                           pgpgout 5057185
kswapd_steal 0                            kswapd_steal 4960805
pg_pgsteal 5056063                        pg_pgsteal 96348
kswapd_pgscan 0                           kswapd_pgscan 4960809
pg_scan 5056064                           pg_scan 96348
pgrefill 0                                pgrefill 0
pgoutrun 0                                pgoutrun 54904
allocstall 158001                         allocstall 3010

real 3m13.034s                            real 3m13.074s
user 0m0.221s                             user 0m0.158s
sys  0m22.793s                            sys  0m38.603s

TODO:
1. Keep debugging the crash isse on NUMA machine.
2. Generate more test cases and look into reducing the lock contention.

Ying Han (5):
  Add kswapd descriptor.
  Add per cgroup reclaim watermarks.
  New APIs to adjust per cgroup wmarks.
  Per cgroup background reclaim.
  Add more per memcg stats.

 Documentation/cgroups/memory.txt |   14 ++
 include/linux/memcontrol.h       |  102 +++++++++
 include/linux/mmzone.h           |    3 +-
 include/linux/res_counter.h      |   83 ++++++++
 include/linux/swap.h             |   12 +-
 kernel/res_counter.c             |    6 +
 mm/memcontrol.c                  |  390 +++++++++++++++++++++++++++++++++++-
 mm/page_alloc.c                  |    1 -
 mm/vmscan.c                      |  418 ++++++++++++++++++++++++++++++++++----
 9 files changed, 979 insertions(+), 50 deletions(-)

-- 
1.7.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]