On 6/25/19 3:00 PM, Andrew Morton wrote:
On Thu, 13 Jun 2019 05:56:49 +0800 Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> wrote:
Currently THP deferred split shrinker is not memcg aware, this may cause
premature OOM with some configuration. For example the below test would
run into premature OOM easily:
$ cgcreate -g memory:thp
$ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes
$ cgexec -g memory:thp transhuge-stress 4000
transhuge-stress comes from kernel selftest.
It is easy to hit OOM, but there are still a lot THP on the deferred
split queue, memcg direct reclaim can't touch them since the deferred
split shrinker is not memcg aware.
Convert deferred split shrinker memcg aware by introducing per memcg
deferred split queue. The THP should be on either per node or per memcg
deferred split queue if it belongs to a memcg. When the page is
immigrated to the other memcg, it will be immigrated to the target
memcg's deferred split queue too.
Reuse the second tail page's deferred_list for per memcg list since the
same THP can't be on multiple deferred split queues.
...
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4579,6 +4579,11 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
#ifdef CONFIG_CGROUP_WRITEBACK
INIT_LIST_HEAD(&memcg->cgwb_list);
#endif
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ spin_lock_init(&memcg->deferred_split_queue.split_queue_lock);
+ INIT_LIST_HEAD(&memcg->deferred_split_queue.split_queue);
+ memcg->deferred_split_queue.split_queue_len = 0;
+#endif
idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
return memcg;
fail:
@@ -4949,6 +4954,14 @@ static int mem_cgroup_move_account(struct page *page,
__mod_memcg_state(to, NR_WRITEBACK, nr_pages);
}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ if (compound && !list_empty(page_deferred_list(page))) {
+ spin_lock(&from->deferred_split_queue.split_queue_lock);
+ list_del(page_deferred_list(page));
It's worrisome that this page still appears to be on the deferred_list
and that the above if() would still succeed. Should this be
list_del_init()?
list_del_init() sounds safe although I'm not quite sure this is
possible. Will update this with fixing build issue together.
+ from->deferred_split_queue.split_queue_len--;
+ spin_unlock(&from->deferred_split_queue.split_queue_lock);
+ }
+#endif