Re: Huge percpu memory usage on multi NUMA node system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Cc Johannes, Shakeel and Nico]

On Thu 09-12-21 00:36:53, Alexey Makhalov wrote:
> Hello,
> 
> I use Vmware VM with the following configuration.
> - 1 vCPU per 1 NUMA node
> - 4 online vCPUs, 128 possible vCPUs. It translates to
> - 4 online nodes and 128 possible nodes.
> - 192VM memory

We have discussed this particular setup in another email thread that is
not related to this particular issue but let me just repeat that I
conside such a configuration rather surprising and suboptimal. I am not
sure what will be the actual topology but a single CPU per NUMA node
will have some interesting side effects (e.g. CPU load balancing etc).

Also too many memory&cpu less nodes is not something many kernel
subsystems are optimized for. At best we are trying to avoid
MAX_NUMNODES scaling and going with the possible nodes. Proper handling
of possible nodes without memory requires memory hotplug notifiers and
synchronization.

> Linux 5.15 with CONFIG_NODES_SHIFT=6 complains on node numbers more
> that maximum supported (1 << 6):
> Nov 27 01:59:37 photon-576f8974caf.org kernel: SRAT: PXM 62 -> APIC 0x7c -> Node 62
> Nov 27 01:59:37 photon-576f8974caf.org kernel: SRAT: PXM 63 -> APIC 0x7e -> Node 63
> Nov 27 01:59:37 photon-576f8974caf.org kernel: SRAT: Too many proximity domains 40
> Nov 27 01:59:37 photon-576f8974caf.org kernel: ACPI: SRAT: SRAT not used.
> Nov 27 01:59:37 photon-576f8974caf.org kernel: No NUMA configuration found
> 
> But it boots OK and Percpu memory amount is 53760 kB
> 
> If I compile with CONFIG_NODES_SHIFT=10 to support 128 nodes, boot warning disappears,
> cpu info reports proper numa nodes for existing cpus.
> But boot process fails with OOM in pid 1.
> 
> Increasing VM RAM from 192 MB to 1024MB fixed OOM.
> /proc/meminfo reported increase in Percpu to 718048 kB !!
> 
> It sounds surprising as number of CPUs are the same in both cases.
> 
> Initial analysis showed that each memory cgroup allocates per node structures. Each of
> them have percpu allocations, doing 128 * 128 * struct size.
> See: mem_cgroup_alloc() -> alloc_mem_cgroup_per_node_info()
> 
> There is also old comment about it in alloc_mem_cgroup_per_node_info()
>       /*
>        * This routine is called against possible nodes.
>        * But it's BUG to call kmalloc() against offline node.
>        *
>        * TODO: this routine can waste much memory for nodes which will
>        *       never be onlined. It's better to use memory hotplug callback
>        *       function.
>        */
> There are might be other places not efficiently using memory for non existing nodes.

Yes, another example would be shrinkers: see http://lkml.kernel.org/r/aa8a8deb-0fdb-9408-48d4-adadb5602d72@xxxxxxxxxx
-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux