On Thu, Oct 21, 2021 at 8:57 AM Vikram Sethi <vsethi@xxxxxxxxxx> wrote: > > > > > -----Original Message----- > > From: Alison Schofield <alison.schofield@xxxxxxxxx> > > Sent: Wednesday, October 20, 2021 8:00 PM > > To: Vikram Sethi <vsethi@xxxxxxxxxx> > > Cc: Rafael J. Wysocki <rafael@xxxxxxxxxx>; Len Brown <lenb@xxxxxxxxxx>; > > Vishal Verma <vishal.l.verma@xxxxxxxxx>; Ira Weiny <ira.weiny@xxxxxxxxx>; > > Ben Widawsky <ben.widawsky@xxxxxxxxx>; Dan Williams > > <dan.j.williams@xxxxxxxxx>; linux-cxl@xxxxxxxxxxxxxxx; linux- > > acpi@xxxxxxxxxxxxxxx > > Subject: Re: [PATCH v3] ACPI: NUMA: Add a node and memblk for each > > CFMWS not in SRAT > > > snip > > > > > > > > > Consumers can use phys_to_target_node() to discover the NUMA node. > > > > > > Does this patch work for CXL type 2 memory which is not in SRAT? A > > > type 2 driver can find its HDM BASE physical address from its CXL > > > registers and figure out its NUMA node id by calling phys_to_target_node? > > > > Yes. This adds the nodes for the case where the BIOS doesn't fully describe > > everything in CFMWS in the SRAT. And, yes, that is how the NUMA node can > > be discovered. > > > > > Or is type 2 HDM currently being skipped altogether? > > > > Not sure what you mean by 'being skipped altogether'? The BIOS may > > describe (all or none or some) of CXL Memory in the SRAT. In the case where > > BIOS describes it all, NUMA nodes will already exist, and no new nodes will > > be added here. > > > My question about skipping type2 wasn't directly related to your patch, > but more of a question about current upstream support for probe/configuration > of type 2 accelerator devices memory, irrespective of whether FW shows type 2 > memory in SRAT. SRAT only has Type-2 ranges if the platform firmware maps the device's memory into the EFI memory map (includes ACPI SRAT / SLIT / HMAT population). I expect that situation to be negotiated on a case by case basis between Type-2 device vendors and platform firmware vendors. There is no requirement that any CXL memory, type-2 or type-3, is mapped by platform firmware. Per the CDAT specification, platform firmware is capable to map CXL into the EFI memory map at boot, but there is no requirement for it to do so. My expectation is that Linux will need to handle the full gamut of possibilities here, i.e. all / some / none of the CXL Type-3 devices present at boot mapped into the EFI memory map, and all / some / none of the CXL Type-2 devices mapped into the EFI memory map. > The desired outcome is that the kernel CXL driver recognizes such type 2 HDM, > and assigns it a NUMA node such that the type 2 driver Note that there's no driver involved at this point. Alison's patch is just augmenting the ACPI declared NUMA nodes at boot so that the core-mm is not surprised by undeclared NUMA nodes at add_memory_driver_managed() time. > can later add/online this memory, > via add_memory_driver_managed which requires a NUMA node ID (which driver can > discover after your patch by calling phys_to_target_node). Yes, with this patch there are at least enough nodes for add_memory_driver_managed() to have a reasonable answer for a NUMA node for Type-2 memory. However, as Jonathan and I were discussing, this minimum enabling may prove insufficient if, for example, you had one CFMWS entry for all Type-2 memory in the system, but multiple disparate accelerators that want to each do add_memory_driver_managed(). In that scenario all of those accelerators, which might want to have a target-node per target-device, will all share one target-node. That said, unless and until it becomes clear that system architectures require Linux to define multiple nodes per CFMWS, I am happy to kick that can down the road. Also, platform firmware can solve this problem by subdividing Type-2 with multiple QTG ids so that multiple target devices can each be assigned to a different CFMWS entry sandbox, i.e. with more degrees of freedom declared by platform firmware in the CFMWS it relieves pressure on the OS to need a dynamic NUMA node definition capability. > Would the current upstream code for HDM work as described above, Current upstream code that enumerates Type-2 is the cxl_acpi driver that enumerates platform CXL capabilities. > and if so, does it > rely on CDAT DSEMTS structure showing a specific value for EFI memory type? i.e would it > work if that field in DSEMTS was either EFI_CONVENTIONAL_MEMORY with EFI_MEMORY_SP, > or EFI_RESERVED_MEMORY? If platform firmware maps the HDM the expectation is that it will use the CDAT to determine the EFI memory type. If platform firmware declines to map the device and lets Linux map it then that's de-facto "reserved" memory and the driver (generic CXL-Type-3 / or vendor specific CXL-Type-2) gets to do insert_resource() with whatever Linux type it deems appropriate, i.e. EFI is out of the picture in this scenario.