On Mon, Dec 05, 2016 at 06:23:26AM +0100, Michal Hocko wrote: > On Fri 02-12-16 09:54:17, Anatoly Stepanov wrote: > > Alex, Vlasimil, Michal, thanks for your responses! > > > > On Fri, Dec 02, 2016 at 10:19:33AM +0100, Michal Hocko wrote: > > > Thanks for CCing me Vlastimil > > > > > > On Fri 02-12-16 09:44:23, Vlastimil Babka wrote: > > > > On 12/01/2016 02:16 AM, Anatoly Stepanov wrote: > > > > > As memcg array size can be up to: > > > > > sizeof(struct memcg_cache_array) + kmemcg_id * sizeof(void *); > > > > > > > > > > where kmemcg_id can be up to MEMCG_CACHES_MAX_SIZE. > > > > > > > > > > When a memcg instance count is large enough it can lead > > > > > to high order allocations up to order 7. > > > > > > This is definitely not nice and worth fixing! I am just wondering > > > whether this is something you have encountered in the real life. Having > > > thousands of memcgs sounds quite crazy^Wscary to me. I am not at all > > > sure we are prepared for that and some controllers would have real > > > issues with it AFAIR. > > > > In our company we use custom-made lightweight container technology, the thing is > > we can have up to several thousands of them on a server. > > So those high-order allocations were observed on a real production workload. > > OK, this is interesting. Definitely worth mentioning in the changelog! > > [...] > > > /* > > > * Do not invoke OOM killer for larger requests as we can fall > > > * back to the vmalloc > > > */ > > > if (size > PAGE_SIZE) > > > gfp_mask |= __GFP_NORETRY | __GFP_NOWARN; > > > > I think we should check against PAGE_ALLOC_COSTLY_ORDER anyway, as > > there's no big need to allocate large contiguous chunks here, at the > > same time someone in the kernel might really need them. > > PAGE_ALLOC_COSTLY_ORDER is and should remain the page allocator internal > implementation detail and shouldn't spread out much outside. GFP_NORETRY > will already make sure we do not push hard here. May be i didn't put my thoughts well, so let's discuss in more detail: 1. Yes, we don't try that hard to allocate high-order blocks with __GFP_NORETRY, but we still can do compaction and direct reclaim, which can be heavy for large chunk. In the worst case we can even fail to find the chunk, after all reclaim/compaction steps were made. 2. The second point is, even if we got the desired chunk quickly, we end up wasting large contiguous chunks, which might be needed for CMA or some h/w driver (DMA for inst.), when they can't use non-contiguous chunks. BTW, in the kernel there are few examples like alloc_fdmem() for inst., which use that "costly order" idea of the fallback. static void *alloc_fdmem(size_t size) { /* * Very large allocations can stress page reclaim, so fall back to * vmalloc() if the allocation size will be considered "large" by the VM. */ if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { void *data = kmalloc(size, GFP_KERNEL_ACCOUNT | __GFP_NOWARN | __GFP_NORETRY); if (data != NULL) return data; } return __vmalloc(size, GFP_KERNEL_ACCOUNT | __GFP_HIGHMEM, PAGE_KERNEL); } Also in netfilter,ceph and some other places. Please, correct me if i'am somewhere mistaken with this thoughts. And don't get me wrong, i'm just trying to get deeper into the problem, especially as we're going to introduce something more or less generic. > > > > > > > > > ret = kzalloc(size, gfp_mask); > > > if (ret) > > > return ret; > > > return vzalloc(size); > > > > > > > > I also do not like memcg_alloc helper name. It suggests we are > > > allocating a memcg while it is used for cache arrays and slab LRUS. > > > Anyway this pattern is quite widespread in the kernel so I would simply > > > suggest adding kvmalloc function instead. > > > > Agreed, it would be nice to have a generic call. > > I would suggest an impl. like this: > > > > void *kvmalloc(size_t size) > > gfp_t gfp_mask should be a parameter as this should be a generic helper. > > > { > > gfp_t gfp_mask = GFP_KERNEL; > > > > void *ret; > > > > if (size > PAGE_SIZE) > > gfp_mask |= __GFP_NORETRY | __GFP_NOWARN; > > > > > > if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { > > ret = kzalloc(size, gfp_mask); > > if (ret) > > return ret; > > } > > No, please just do as suggested above. Tweak the gfp_mask for higher > order requests and do kmalloc first with vmalloc as a fallback. > > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>