Re: [PATCH 2/2 v6] mm/mempolicy: Don't create weight sysfs for memoryless nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 06, 2025 at 09:39:26PM +0900, Honggyu Kim wrote:
> 
> The memoryless nodes are printed as follows after those ACPI, SRAT,
> Node N PXM M messages.
> 
>   [    0.010927] Initmem setup node 0 [mem
> 0x0000000000001000-0x000000207effffff]
>   [    0.010930] Initmem setup node 1 [mem
> 0x0000060f80000000-0x0000064f7fffffff]
>   [    0.010992] Initmem setup node 2 as memoryless
>   [    0.011055] Initmem setup node 3 as memoryless
>   [    0.011115] Initmem setup node 4 as memoryless
>   [    0.011177] Initmem setup node 5 as memoryless
>   [    0.011238] Initmem setup node 6 as memoryless
>   [    0.011299] Initmem setup node 7 as memoryless
>   [    0.011361] Initmem setup node 8 as memoryless
>   [    0.011422] Initmem setup node 9 as memoryless
>   [    0.011484] Initmem setup node 10 as memoryless
>   [    0.011544] Initmem setup node 11 as memoryless
> 
> This is related why the 12 nodes at sysfs knobs are provided with the
> current N_POSSIBLE loop.
> 

This isn't actually why, this is another symptom.  This gets printed
because someone is marking nodes 4-11 as possible and setup_nr_node_ids
reports 12 total nodes

void __init setup_nr_node_ids(void)
{
        unsigned int highest;

        highest = find_last_bit(node_possible_map.bits, MAX_NUMNODES);
        nr_node_ids = highest + 1;
}

Given your configuration data so far, we may have a bug somewhere (or
i'm missing a configuration piece).

> > Basically I need to know:
> > 1) Is each CXL device on a dedicated Host Bridge?
> > 2) Is inter-host-bridge interleaving configured?
> > 3) Is intra-host-bridge interleaving configured?
> > 4) Do SRAT entries exist for all nodes?
> 
> Are there some simple commands that I can get those info?
> 

The content of the CEDT would be sufficient - that will show us the
number of CXL host bridges.

> > 5) Why are there 12 nodes but only 10 sources? Are there additional
> >     devices left out of your diagram? Are there 2 CFMWS but and 8 Memory
> >     Affinity records - resulting in 10 nodes? This is strange.
> 
> My blind guess is that there could be a logic node that combines 4ch of
> CXL memory so there are 5 nodes per each socket.  Adding 2 nodes for
> local CPU/DRAM makes 12 nodes in total.
>

The issue is that nodes have associated memory regions.  If there are
multiple nodes with overlapping memory regions, that seems problematic.

If there are "possible nodes" without memory and no real use case
(because the memory is associated with the aggregate node) then those
nodes probably shouldn't be reported as possible.

the tl;dr here is we should figure out what is marking those nodes as
possible.

> Not sure about this part but our approach with hotplug_memory_notifier()
> resolves this problem.  Rakie will submit an initial working patchset
> soonish.

This may just be a bandaid on the issue.  We should get our node
configuration correct from the get-go.

~Gregory




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux