Hello Kame, >> diff --git a/mm/slab.c b/mm/slab.c >> index a98295f..7e82f99 100644 >> --- a/mm/slab.c >> +++ b/mm/slab.c >> @@ -3535,6 +3535,8 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, >> cache_alloc_debugcheck_before(cachep, flags); >> local_irq_save(save_flags); >> objp = __do_slab_alloc_node(cachep, flags, nodeid); >> + if (slab_should_retry(objp, flags)) >> + objp = __do_slab_alloc_node(cachep, flags, nodeid); > > 3 questions. > > 1. why can't we do retry in memcg's code (or kmem/memcg code) rather than slab.c ? Due to two main reasons: a. this is not memcg/kmemcg specific. I used kmemcg to make the container very small, therefore, more likely. But it can also happen in non-constrained systems. b. memcg hooks into the page allocation. This patchset deals with cases in which we can't, really, allocate a new page. However, we are confident that we could allocate a new *object* should we retry. > 2. It should be retries even if memory allocator returns NULL page ? Yes, this is the whole point of this exercise. When we return a NULL page, we are almost certain to have called reclaim. Reclaim will call shrink_slab(), that may free objects within a page. So if we retry, we may now find space within the page, even if we can't have a full page. > 3. What's relationship with oom-killer ? The first __do_slab_alloc() will not > invoke oom-killer and returns NULL ? > Good question. In all my testing, I've never seen the oom killer be invoked for failed slab allocations, for either slab or slub. What I usually see is just the allocator giving up and flooding the log with failure messages. It seemed logical to me, so I never really asked myself why wasn't the oom killer invoked. (It usually is invoked right after if I fire a user memory hog). Perhaps someone can shed a light on the subject? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html