> OK, this would suggest that some charges were accounted to a different > group than the corresponding pages group's LRUs or that the charge cache (stock) > is b0rked (the later can be checked easily by making refill_stock a noop > - see the patch below - I am skeptical that would help though). You were right, still no change. > Let's rule out some usual suspects while I am staring at the > code. Are the tasks migrated between groups? What is the value of > memory.move_charge_at_immigrate? Have you seen any memcg oom messages > in the log? - i dont see anything about migration, but there is a part about setting "memory.force_empty". i did not see the corresponding trace output .. but i will recheck this. (see https://github.com/SchedMD/slurm/blob/master/src/plugins/jobacct_gather/cgroup/jobacct_gather_cgroup_memory.c) - the only interesting part of the release_agent is the removal of the cgroup hierarchy (mountdir is /sys/fs/cgroup/memory): flock -x ${mountdir} -c "rmdir ${rmcg}" - memory.move_charge_at_immigrate is "0" - i never saw any oom messages related to this problem. i checked explicitly before reporting the first time, if this might somehow be oom-related -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html