Jonathan Cameron wrote: > On Fri, 24 May 2024 12:05:28 -0700 > Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > # Title: "Extended-linear" addressing for direct-mapped memory-side caches > > > > # Status: v3 > > > > # Document: ACPI Specification 6.6 > > > > # License > > SPDX-License Identifier: CC-BY-4.0 > > > > # Submitter: > > * Sponsor: Dan Williams, Intel > > * Creators/Contributors: > > * Andy Rudoff, retired > > * Mahesh Natu, Intel > > * Ishwar Agarwal, Intel > > > > # Changelog > > * v3: Replace "Inclusive Linear" with "Extended-linear" term, and > > clarify the SPA vs HPA behavior of this cache addressing mode. > > (Jonathan Cameron) > > * v2: Clarify the "Inclusive" term as "including the capacity of the cache > > in the SRAT range length" > > * v2: Clarify that 0 is an undeclared / transparent Address Mode, and > > that Address Mode values other than 1 are Reserved. > > > > # Summary of the Change > > Recall that one of the modes available with persistent memory (PMEM) was a > > direct-mapped memory-side cache where DDR-memory transparently cached > > PMEM. This article has more details: > > > > https://thessdguy.com/intels-optane-two-confusing-modes-part-2-memory-mode/ > > > > ...but the main takeaway of that article that is relevant for this ECN > > is: > > > > "[PMEM] is paired with a DRAM that behaves as a cache, and, > > like a cache, it is invisible to the user. [..] A typical system > > might combine a 64GB DRAM DIMM with a 512GB Optane DIMM, but the > > total memory size will appear to the software as only 512GB." > > > > Instead, this new "extended-linear" direct-mapped memory-side cache > > addressing mode would make the memory-size that appears to software in > > the above example as 576GB. The inclusion of the DDR capacity to extend > > the capacity visible to software may improve cache utilization. > > I'd skip the cache utilization point as even with 'may' it might just > end up rat holing!. Capacity seems enough a justification to me and > requires a lot less justification. Sure. > Perhaps something like > "The inclusion of the DDR increases the available capacity whilst still > providing benefits of a lower latency cache." Per above will just keep it dryly associated with capacity and not make any performance claims. > Up to you though as I'll not have to explain that utilization point > to anyone whereas you might. > > > > > A primary motiviation for updating HMAT to explicitly enumerate this > > addressing mode is due to the OSPM's increased role for RAS and > > address-translation with CXL topologies. With CXL and OS native RAS > > flows OSPM is responsible for understanding and navigating the > > relationship between System-Physical-Address (SPA) ranges published > > ACPI.SRAT.MemoryAffinity, Host-Physical-Address ranges (HPA) published > > in the ACPI.CEDT.CFMWS, and HPAs programmed in CXL memory expander > > endpoints. > > > > Enable an OSPM to enumerate that the capacity for a memory-side cache > > extends an SRAT range. Typically the "Memory Side Cache Size" enumerated > > in the HMAT is "excluded" from the SRAT range length because it is a > > transparent cache of the SRAT capacity. The enumeration of this > > addressing mode enables OSPM-memory-RAS (Reliability, Availability, and > > Serviceability) flows. > > > > # Benefits of the Change > > Without this change an OSPM that encounters a memory-side cache > > configuration of DDR fronting CXL may not understand that an SRAT range > > extended by cache capacity should be maintained as one contiguous SPA > > range even though the CXL HPA decode configuration only maps a subset of > > the SRAT SPA range. In other words the memory-side-cache dynamically > > maps access to that SPA range to either a CXL or DDR HPA. > > > > When the OSPM knows about this relationship it can take actions like > > quarantine / offline all the impacted aliased pages to prevent further > > consumption of poison, or run repair operations on all the affected > > targets. Without this change an OSPM may not accurately identify the HPA > > associated with a given CXL endpoint DPA event, or it may misunderstand > > the SPAs that map to CXL HPAs. > > I'd like something here on impacts on firmware first error reporting. > Given we'd like that to work on a non CXL aware system not aware of this > feature at all, I'd propose multiple CPER records, one for each alias. > That assumes the firmware has no path to establish the alias. > > Can certainly conceive of ways to implement a probe-type setup to allow > the discovery of which alias has been poisoned etc. > > Perhaps needs a note somewhere in 18.3. Something along lines of > "For any error with SPA originating in a range, where a memory-side cache > with address mode extended-linear is present, multiple error records > should be presented to cover any potentially affected aliases." > > Maybe an OS could opt out of that multiple reporting via _OSC or similar > but I'm not sure why it would bother though. Easier to just allow for > multiple events. Makes sense to add a note about the "multiple CPER record" expectation. Effectively this ECN is about allowing native-error-handling to do the same. > > # Impact of the Change > > The proposed "Address Mode" field consumes the 2 Reserved bytes > > following the "Cache Attributes" field in the "Memory Side Cache > > Information Structure". The default reserved value of 0 indicates the > > status quo of an undeclared addressing mode where the expectation is > > that it is safe to assume the cache-capacity is transparent to the SRAT > > range capacity. An OSPM that knows about new values can consider SPA to > > HPA relationships according to the address-layout definition proposed > > below. A legacy OSPM will ignore it as a Reserved field. > > > > # References > > * Compute Express Link Specification v3.1, > > <https://www.computeexpresslink.org/> > > > > # Detailed Description of the Change > > Probably need to up rev HMAT as well. I'd let the ACPI working group make that determination. I am not clear on whether repurposing a reserved field mandates a version bump. > > > > * Section Table 5.149: Memory Side Cache Information Structure redefine > > the 2 Reserved bytes starting at offset 28 as "Address Mode": > > > > * 0 - Reserved (OSPM may assume transparent cache addressing) > > Can we make that assumption? What are today's firmware's doing for this? The only shipping example I know of was for PMEM. > I'd drop the 'may assume' Also after this change it's not reserved. > 0 explicitly means transparent cache addressing. I am just going to switch the parenthetical to "(Unknown Address Mode)" because "transparent" does not give any actionable information about alias layout in the SRAT address space. So system-software can make no assumptions about layout without consulting implementation specific documentation. > > * 1 - Extended-linear (N direct-map aliases linearly mapped) > > * 2..65535 - Reserved (Unknown Address Mode) > > > > * Extend the implementation note after Table 5.149 to explain how to > > interpret the "Extended-linear" mode. > > > > * When Address Mode is 1 'Extended-Linear' it indicates that the > > associated address range (SRAT.MemoryAffinityStructure.Length) is > > comprised of the backing store capacity extended by the cache > > capacity. It is arranged such that there are N directly addressable > > aliases of a given cacheline where N is the ratio of target memory > > proximity domain size and the memory side cache size. Where the N > > aliased addresses for a given cacheline all share the same result > > for the operation 'address modulo cache size'. > > Probably need more here. What if someone has two such ranges of size > > Address 0, (512G + 64G) , (1024G + 128G) > And decides to pack them for some reason. > The second one will be aligned to 64G not, 128G so modulo needs to take > into account the base address. Decides to pack them how? My expectation in this situation is 2 proximity domains / memory-side cache descriptions. > Do we need explicit statement that N is an integer? Probably works anyway > but having 2.5 aliases is an unusual concept. Easy enough to add "(integer)" after the first reference of "N". > > This setting is only > > allowed when 'Cache Associativity' is 'Direct Map'." > > Other than these corner cases looks good to me and the new terminology and > clarifications help a lot. Thanks for the feedback.