On 11/24/2016 02:12 PM, Michal Hocko wrote: > On Thu 24-11-16 13:45:03, Nikolay Borisov wrote: > [...] >> Ok, I think I know what has happened. Inspecting the data structures of >> the respective cgroup here is what the mem_cgroup_per_zone looks like: >> >> zoneinfo[2] = { >> lruvec = {{ >> lists = { >> { >> next = 0xffffea004f98c660, >> prev = 0xffffea0063f6b1a0 >> }, >> { >> next = 0xffffea0004123120, >> prev = 0xffffea002c2e2260 >> }, >> { >> next = 0xffff8818c37bb360, >> prev = 0xffff8818c37bb360 >> }, >> { >> next = 0xffff8818c37bb370, >> prev = 0xffff8818c37bb370 >> }, >> { >> next = 0xffff8818c37bb380, >> prev = 0xffff8818c37bb380 >> } >> }, >> reclaim_stat = { >> recent_rotated = {172969085, 43319509}, >> recent_scanned = {173112994, 185446658} >> }, >> zone = 0xffff88207fffcf00 >> }}, >> lru_size = {159722, 158714, 0, 0, 0}, >> } >> >> So this means that there are inactive_anon and active_annon only - >> correct? > > yes. at least in this particular zone. > >> Since the machine doesn't have any swap this means anon memory >> has nowhere to go. If I'm interpreting the data correctly then this >> explains why reclaim makes no progress. If that's the case then I have >> the following questions: >> >> 1. Shouldn't reclaim exit at some point rather than being stuck in >> reclaim without making further progress. > > Reclaim (try_to_free_mem_cgroup_pages) has to go down all priorities > without to get out. We are not doing any pro-active checks whether there > is anything reclaimable but that alone shouldn't be such a big deal > because shrink_node_memcg should simply do nothing because > get_scan_count will find no pages to scan. So it shouldn't take much > time to realize there is nothing to reclaim and get back to try_charge > which retries few more times and eventually goes OOM. I do not see how > we could trigger rcu stalls here. There shouldn't be any long RCU > critical section on the way and preemption points on the way. > >> 2. It seems rather strange that there are no (INACTIVE|ACTIVE)_FILE >> pages - is this possible? > > All of them might be reclaimed already as a result of the memory > pressure in the memcg. So not all that surprising. But the fact that > you are hitting the limit means that the anonymous pages saturate your > hard limit so your memcg seems underprovisioned. > >> 3. Why hasn't OOM been activated in order to free up some anonymous memory ? > > It should eventually. Maybe there still were some reclaimable pages in > other zones for this memcg. I just checked all the zones for both nodes (the machines have 2 NUMA nodes) so essentially there are no reclaimable pages - all are anonymous. So the pertinent question is why process are sleeping in reclamation path when there are no pages to free. I also observed the same behavior on a different node, this time the priority was 0 and the code hasn't resorted to OOM. This seems all too strange.. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>