Re: [PATCH v2] ACPI: NUMA: debug invalid unused PXM value for CFMWs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 14, 2025 at 03:55:44PM +0800, Yuquan Wang wrote:
> On Thu, Mar 13, 2025 at 09:28:46AM -0700, Alison Schofield wrote:
> > On Thu, Mar 13, 2025 at 02:09:07PM +0800, Yuquan Wang wrote:
> > > The absence of SRAT would cause the fake_pxm to be -1 and increment
> > > to 0, then send to acpi_parse_cfmws(). If there exists CXL memory
> > > ranges that are defined in the CFMWS and not already defined in the
> > > SRAT, the new node (node0) for the CXL memory would be invalid, as
> > > node0 is already in "used".
> > > 
> > > This utilizes disable_srat() & srat_disabled() to fail CXL init.
> > 
> > Seems like this fixup has drifted from adjusting the fake_pxm to 
> > shutting down CXL parsing. More below -
> > 
> > > 
> > > Signed-off-by: Yuquan Wang <wangyuquan1236@xxxxxxxxxxxxxx>
> > > ---
> > > 
> > > Changes in v2:
> > > - Add disable_srat() when fake_pxm is invalid
> > > - Add srat_disabled() check in cxl_acpi_probe() and acpi_parse_cfmws()
> > > 
> > > 
> > >  drivers/acpi/numa/srat.c | 10 ++++++++++
> > >  drivers/cxl/acpi.c       |  4 ++++
> > >  2 files changed, 14 insertions(+)
> > > 
> > > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> > > index 00ac0d7bb8c9..2dac25c9258a 100644
> > > --- a/drivers/acpi/numa/srat.c
> > > +++ b/drivers/acpi/numa/srat.c
> > > @@ -441,6 +441,11 @@ static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> > >  	start = cfmws->base_hpa;
> > >  	end = cfmws->base_hpa + cfmws->window_size;
> > >  
> > > +	if (srat_disabled()) {
> > > +		pr_err("SRAT is missing or bad while processing CFMWS.\n");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > 
> > This goes too far by shutting down cfmws parsing for lack of SRAT.
> >
> 
> Actually, I thought there need another patch to fix the follow problem
> that the fake node bring when no SRAT.
> 
> Detailed description below.
> 
> > >  	/*
> > >  	 * The SRAT may have already described NUMA details for all,
> > >  	 * or a portion of, this CFMWS HPA range. Extend the memblks
> > > @@ -646,6 +651,11 @@ int __init acpi_numa_init(void)
> > >  		if (node_to_pxm_map[i] > fake_pxm)
> > >  			fake_pxm = node_to_pxm_map[i];
> > >  	}
> > > +
> > > +	/* Make sure CFMWs fake nodes start at node[1] */
> > > +	if (fake_pxm < 0)
> > > +		disable_srat();
> > > +
> > 
> > How does the code above make sure fake node starts at node[1]?
> > Would an explicit adjustment like this work?
> 
> Thanks for your correction :) Yes, the way I used here is too implicit.
> 
> > 
> > -       last_real_pxm = fake_pxm;
> > -       fake_pxm++;
> > +       fake_pxm = max(fake_pxm, 1);
> > +       last_real_pxm = fake_pxm--;
> 
> I tried the adjustment below: 
> 
> 	fake_pxm = max(fake_pxm, 0); // 0 because it will increment to 1  
> 	last_real_pxm = fake_pxm++;
> 
> This works but it might only control the parameter sent to acpi_parse_cfmws(). 
> According to acpi_map_pxm_to_node(), altough the input fake_pxm is 1 when no
> SRAT, the returned node would still be 0 and the following nodes are aslo
> incorrect.
> 
> Hence, I tried add a new line below:
> 
> 	fake_pxm = max(fake_pxm, 0);
> 	last_real_pxm = fake_pxm++;
>         node_set(0, nodes_found_map);
> 
> As no matter what situation, node[0] would be found and set. With this
> setting, acpi_map_pxm_to_node() could return the expected node value
> even if no SRAT. :( 
> 
> Unfortunately, when we use "cxl create-region" to enable our cxl memory,
> it would still be assigned to node[0], because the "numa_add_memblk()"
> can only add numa_memblk to numa_meminfo list. 
> 
> If our SRAT is OK, the numa_memblks_init() would then utilize
> numa_move_tail_memblk() to move the numa_memblk from numa_meminfo to
> numa_reserved_meminfo in CFMWs fake node situation. If SRAT is missing
> or bad, the numa_memblks_init() would fail since init_func() would fail. 
> And it causes that no numa_memblk in numa_reserved_meminfo list and the
> following dax&memory_hotplug drivers could not online the expected fake
> node. 
> 
> Based on the above problem, I have a new patch idea that introduce a new
> function in mm/numa_memblks.c: numa_add_reserved_memblk(). It could add
> one numa_memblk to nuam_reserved_meminfo directly. Maybe we could call
> it in acpi_parse_cfmws() if srat is missing.
> 
> In mm/numa_memblks.c:
> 
> 	int __init numa_add_reserved_memblk(int nid, u64 start, u64 end)
> 	{		
> 		return numa_add_memblk_to(nid, start, end, &numa_reserved_meminfo);
> 	}
> 
> In drivers/acpi/numa/srat.c:
> 
> 	if (srat_disabled()) {
> 		if (numa_add_reserved_memblk(node, start, end) < 0) {
> 			pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
> 			node, start, end);
> 		}
> 	}
> 
> :( But..., the dax_kmem driver will fail because something wrong in
> memory_group_register_static(). The good result is our cxl memory would
> not be assigned to node[0] anymore!
> 
> BTW, as papering these things looks like not easily, I chose to aggressively
> fail the acpi_parse_cfmws() in srat.c since it mainly works for building
> cxl fake nodes and also fail the CXL init in cxl_acpi_probe per Jonathan.
> 
> Link: https://lists.nongnu.org/archive/html/qemu-devel/2025-03/msg03668.html
> 
> Hopes more comments to guide me! I'm a really rookie in kernel community :P
> 

No worries Yuquan. This is how we do it. Keep iterating!
I see you, Jonathan, and Gregory chatting about this a bit on the LSFMM thread,
and they are probably following along better than I. I'm getting a bit
lost in the diffs here in this thread.

At this point, I suggest sending the next revision of this patch with an
updated commit log describing the failing scenario, how the changes solve
'your' problem, and why it doesn't break anything.


> > >  	last_real_pxm = fake_pxm;
> > >  	fake_pxm++;
> > >  	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws,
> > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > > index cb14829bb9be..e75a8ead99f6 100644
> > > --- a/drivers/cxl/acpi.c
> > > +++ b/drivers/cxl/acpi.c
> > > @@ -829,6 +829,10 @@ static int cxl_acpi_probe(struct platform_device *pdev)
> > >  	if (rc)
> > >  		return rc;
> > >  
> > > +	/* CXL must be in a NUMA system */
> > > +	if (srat_disabled())
> > > +		return -EINVAL;
> > > +
> > >  	cxl_res = devm_kzalloc(host, sizeof(*cxl_res), GFP_KERNEL);
> > >  	if (!cxl_res)
> > >  		return -ENOMEM;
> > > -- 
> > > 2.34.1
> > > 
> 




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux