On Sat 03-12-16 18:55:22, Anatoly Stepanov wrote: > On Tue, Dec 06, 2016 at 09:47:35AM +0100, Michal Hocko wrote: > > On Sat 03-12-16 01:09:13, Anatoly Stepanov wrote: > > > On Mon, Dec 05, 2016 at 06:23:26AM +0100, Michal Hocko wrote: > > > > On Fri 02-12-16 09:54:17, Anatoly Stepanov wrote: > > > > > Alex, Vlasimil, Michal, thanks for your responses! > > > > > > > > > > On Fri, Dec 02, 2016 at 10:19:33AM +0100, Michal Hocko wrote: > > > > > > Thanks for CCing me Vlastimil > > > > > > > > > > > > On Fri 02-12-16 09:44:23, Vlastimil Babka wrote: > > > > > > > On 12/01/2016 02:16 AM, Anatoly Stepanov wrote: > > > > > > > > As memcg array size can be up to: > > > > > > > > sizeof(struct memcg_cache_array) + kmemcg_id * sizeof(void *); > > > > > > > > > > > > > > > > where kmemcg_id can be up to MEMCG_CACHES_MAX_SIZE. > > > > > > > > > > > > > > > > When a memcg instance count is large enough it can lead > > > > > > > > to high order allocations up to order 7. > > > > > > > > > > > > This is definitely not nice and worth fixing! I am just wondering > > > > > > whether this is something you have encountered in the real life. Having > > > > > > thousands of memcgs sounds quite crazy^Wscary to me. I am not at all > > > > > > sure we are prepared for that and some controllers would have real > > > > > > issues with it AFAIR. > > > > > > > > > > In our company we use custom-made lightweight container technology, the thing is > > > > > we can have up to several thousands of them on a server. > > > > > So those high-order allocations were observed on a real production workload. > > > > > > > > OK, this is interesting. Definitely worth mentioning in the changelog! > > > > > > > > [...] > > > > > > /* > > > > > > * Do not invoke OOM killer for larger requests as we can fall > > > > > > * back to the vmalloc > > > > > > */ > > > > > > if (size > PAGE_SIZE) > > > > > > gfp_mask |= __GFP_NORETRY | __GFP_NOWARN; > > > > > > > > > > I think we should check against PAGE_ALLOC_COSTLY_ORDER anyway, as > > > > > there's no big need to allocate large contiguous chunks here, at the > > > > > same time someone in the kernel might really need them. > > > > > > > > PAGE_ALLOC_COSTLY_ORDER is and should remain the page allocator internal > > > > implementation detail and shouldn't spread out much outside. GFP_NORETRY > > > > will already make sure we do not push hard here. > > > > > > May be i didn't put my thoughts well, so let's discuss in more detail: > > > > > > 1. Yes, we don't try that hard to allocate high-order blocks with > > > __GFP_NORETRY, but we still can do compaction and direct reclaim, > > > which can be heavy for large chunk. In the worst case we can even > > > fail to find the chunk, after all reclaim/compaction steps were made. > > > > Yes this is correct. But I am not sure what you are trying to tell > > by that. Highorder requests are a bit of a problem. That's why > > __GFP_NORETRY is implicit here. It also guarantees that we won't hit > > the OOM killer because we do have a reasonable fallback. I do not see a > > point to play with COSTLY_ORDER though. The page allocator knows how to > > handle those and we are trying hard that those requests are not too > > disruptive. Or am I still missing your point? > > My point is, while we're trying to get a pretty big contig. chunk > (let's say of COSTLY_SIZE), the reclaim can induce a lot of disk I/O Not really, as I've tried to explain above. The page allocator really doesn't try hard for costly orders and bail out early after the first round of reclaim compaction. > which can be crucial for overall system performance, at the same time > we don't need that contig. chunk. > > So, for COSTLY_SIZE chunks, vmalloc should perform better, as it's > obviosly more likely to find order-0 blocks w/o reclaim. Again, vmalloc is not free either and a problem especially on 32b arches. Anyway, I think we are going in circles here and repeating the same arguments. Let me post what I think is the right implementation of kvmalloc and you can build on top of that. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>