Alexander Graf <agraf@xxxxxxx> writes: >> Am 05.05.2014 um 16:35 schrieb "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx>: >> >> Alexander Graf <agraf@xxxxxxx> writes: >> >>>> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote: >>>> We reserve 5% of total ram for CMA allocation and not using that can >>>> result in us running out of numa node memory with specific >>>> configuration. One caveat is we may not have node local hpt with pinned >>>> vcpu configuration. But currently libvirt also pins the vcpu to cpuset >>>> after creating hash page table. >>> >>> I don't understand the problem. Can you please elaborate? >> >> Lets take a system with 100GB RAM. We reserve around 5GB for htab >> allocation. Now if we use rest of available memory for hugetlbfs >> (because we want all the guest to be backed by huge pages), we would >> end up in a situation where we have a few GB of free RAM and 5GB of CMA >> reserve area. Now if we allow hash page table allocation to consume the >> free space, we would end up hitting page allocation failure for other >> non movable kernel allocation even though we still have 5GB CMA reserve >> space free. > > Isn't this a greater problem? We should start swapping before we hit > the point where non movable kernel allocation fails, no? But there is nothing much to swap. Because most of the memory is reserved for guest RAM via hugetlbfs. > > The fact that KVM uses a good number of normal kernel pages is maybe > suboptimal, but shouldn't be a critical problem. Yes. But then in this case we could do better isn't it ? We already have a large part of guest RAM kept aside for htab allocation which cannot be used for non movable allocation. And we ignore that reserve space and use other areas for hash page table allocation with the current code. We actually hit this case in one of the test box. KVM guest htab at c000001e50000000 (order 30), LPID 1 libvirtd invoked oom-killer: gfp_mask=0x2000d0, order=0,oom_score_adj=0 libvirtd cpuset=/ mems_allowed=0,16 CPU: 72 PID: 20044 Comm: libvirtd Not tainted 3.10.23-1401.pkvm2_1.4.ppc64 #1 Call Trace: [c000001e3b63f150] [c000000000017330] .show_stack+0x130/0x200(unreliable) [c000001e3b63f220] [c00000000087a888] .dump_stack+0x28/0x3c [c000001e3b63f290] [c000000000876a4c] .dump_header+0xbc/0x228 [c000001e3b63f360] [c0000000001dd838].oom_kill_process+0x318/0x4c0 [c000001e3b63f440] [c0000000001de258] .out_of_memory+0x518/0x550 [c000001e3b63f520] [c0000000001e5aac].__alloc_pages_nodemask+0xb3c/0xbf0 [c000001e3b63f700] [c000000000243580] .new_slab+0x440/0x490 [c000001e3b63f7a0] [c0000000008781fc] .__slab_alloc+0x17c/0x618 [c000001e3b63f8d0] [c0000000002467fc].kmem_cache_alloc_node_trace+0xcc/0x300 [c000001e3b63f990] [c00000000010f62c].alloc_fair_sched_group+0xfc/0x200 [c000001e3b63fa60] [c000000000104f00].sched_create_group+0x50/0xe0 [c000001e3b63fae0] [c000000000104fc0].cpu_cgroup_css_alloc+0x30/0x80 [c000001e3b63fb60] [c0000000001513ec] .cgroup_mkdir+0x2bc/0x6e0 [c000001e3b63fc50] [c000000000275aec] .vfs_mkdir+0x14c/0x220 [c000001e3b63fcf0] [c00000000027a734] .SyS_mkdirat+0x94/0x110 [c000001e3b63fdb0] [c00000000027a7e4] .SyS_mkdir+0x34/0x50 [c000001e3b63fe30] [c000000000009f54] syscall_exit+0x0/0x98 Node 0 DMA free:23424kB min:23424kB low:29248kB high:35136kB active_anon:0kB inactive_anon:128kB active_file:256kB inactive_file:384kB unevictable:9536kB isolated(anon):0kB isolated(file):0kB present:67108864kB managed:65931776kB mlocked:9536kB dirty:64kB writeback:0kB mapped:5376kB shmem:0kB slab_reclaimable:23616kB slab_unreclaimable:1237056kB kernel_stack:18256kB pagetables:1088kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:78 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 Node 16 DMA free:5787008kB min:21376kB low:26688kB high:32064kB active_anon:1984kB inactive_anon:2112kB active_file:896kB inactive_file:64kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:67108864kB managed:60060032kB mlocked:0kB dirty:128kB writeback:3712kB mapped:0kB shmem:0kB slab_reclaimable:23424kB slab_unreclaimable:826048kB kernel_stack:576kB pagetables:1408kB unstable:0kB bounce:0kB free_cma:5767040kB writeback_tmp:0kB pages_scanned:756 all_unreclaimable? yes -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html