On Wed, 5 Jun 2024 10:10:12 +0100 Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: > On Fri, 24 May 2024 12:05:28 -0700 > Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > # Title: "Extended-linear" addressing for direct-mapped memory-side caches > > > > # Status: v3 > > > > # Document: ACPI Specification 6.6 > > > > # License > > SPDX-License Identifier: CC-BY-4.0 > > > > # Submitter: > > * Sponsor: Dan Williams, Intel > > * Creators/Contributors: > > * Andy Rudoff, retired > > * Mahesh Natu, Intel > > * Ishwar Agarwal, Intel > > > > # Changelog > > * v3: Replace "Inclusive Linear" with "Extended-linear" term, and > > clarify the SPA vs HPA behavior of this cache addressing mode. > > (Jonathan Cameron) > > * v2: Clarify the "Inclusive" term as "including the capacity of the cache > > in the SRAT range length" > > * v2: Clarify that 0 is an undeclared / transparent Address Mode, and > > that Address Mode values other than 1 are Reserved. > > > > # Summary of the Change > > Recall that one of the modes available with persistent memory (PMEM) was a > > direct-mapped memory-side cache where DDR-memory transparently cached > > PMEM. This article has more details: > > > > https://thessdguy.com/intels-optane-two-confusing-modes-part-2-memory-mode/ > > > > ...but the main takeaway of that article that is relevant for this ECN > > is: > > > > "[PMEM] is paired with a DRAM that behaves as a cache, and, > > like a cache, it is invisible to the user. [..] A typical system > > might combine a 64GB DRAM DIMM with a 512GB Optane DIMM, but the > > total memory size will appear to the software as only 512GB." > > > > Instead, this new "extended-linear" direct-mapped memory-side cache > > addressing mode would make the memory-size that appears to software in > > the above example as 576GB. The inclusion of the DDR capacity to extend > > the capacity visible to software may improve cache utilization. > > I'd skip the cache utilization point as even with 'may' it might just > end up rat holing!. Capacity seems enough a justification to me and > requires a lot less justification. > > Perhaps something like > "The inclusion of the DDR increases the available capacity whilst still > providing benefits of a lower latency cache." > > Up to you though as I'll not have to explain that utilization point > to anyone whereas you might. > > > > > A primary motiviation for updating HMAT to explicitly enumerate this > > addressing mode is due to the OSPM's increased role for RAS and > > address-translation with CXL topologies. With CXL and OS native RAS > > flows OSPM is responsible for understanding and navigating the > > relationship between System-Physical-Address (SPA) ranges published > > ACPI.SRAT.MemoryAffinity, Host-Physical-Address ranges (HPA) published > > in the ACPI.CEDT.CFMWS, and HPAs programmed in CXL memory expander > > endpoints. > > > > Enable an OSPM to enumerate that the capacity for a memory-side cache > > extends an SRAT range. Typically the "Memory Side Cache Size" enumerated > > in the HMAT is "excluded" from the SRAT range length because it is a > > transparent cache of the SRAT capacity. The enumeration of this > > addressing mode enables OSPM-memory-RAS (Reliability, Availability, and > > Serviceability) flows. > > > > # Benefits of the Change > > Without this change an OSPM that encounters a memory-side cache > > configuration of DDR fronting CXL may not understand that an SRAT range > > extended by cache capacity should be maintained as one contiguous SPA > > range even though the CXL HPA decode configuration only maps a subset of > > the SRAT SPA range. In other words the memory-side-cache dynamically > > maps access to that SPA range to either a CXL or DDR HPA. > > > > When the OSPM knows about this relationship it can take actions like > > quarantine / offline all the impacted aliased pages to prevent further > > consumption of poison, or run repair operations on all the affected > > targets. Without this change an OSPM may not accurately identify the HPA > > associated with a given CXL endpoint DPA event, or it may misunderstand > > the SPAs that map to CXL HPAs. > > I'd like something here on impacts on firmware first error reporting. > Given we'd like that to work on a non CXL aware system not aware of this > feature at all, I'd propose multiple CPER records, one for each alias. > That assumes the firmware has no path to establish the alias. > > Can certainly conceive of ways to implement a probe-type setup to allow > the discovery of which alias has been poisoned etc. > > Perhaps needs a note somewhere in 18.3. Something along lines of > "For any error with SPA originating in a range, where a memory-side cache > with address mode extended-linear is present, multiple error records > should be presented to cover any potentially affected aliases." > > Maybe an OS could opt out of that multiple reporting via _OSC or similar > but I'm not sure why it would bother though. Easier to just allow for > multiple events. > > > > > # Impact of the Change > > The proposed "Address Mode" field consumes the 2 Reserved bytes > > following the "Cache Attributes" field in the "Memory Side Cache > > Information Structure". The default reserved value of 0 indicates the > > status quo of an undeclared addressing mode where the expectation is > > that it is safe to assume the cache-capacity is transparent to the SRAT > > range capacity. An OSPM that knows about new values can consider SPA to > > HPA relationships according to the address-layout definition proposed > > below. A legacy OSPM will ignore it as a Reserved field. > > > > # References > > * Compute Express Link Specification v3.1, > > <https://www.computeexpresslink.org/> > > > > # Detailed Description of the Change > > Probably need to up rev HMAT as well. > > > > > * Section Table 5.149: Memory Side Cache Information Structure redefine > > the 2 Reserved bytes starting at offset 28 as "Address Mode": > > > > * 0 - Reserved (OSPM may assume transparent cache addressing) > > Can we make that assumption? What are today's firmware's doing for this? > I'd drop the 'may assume' Also after this change it's not reserved. > 0 explicitly means transparent cache addressing. > > > * 1 - Extended-linear (N direct-map aliases linearly mapped) > > * 2..65535 - Reserved (Unknown Address Mode) > > > > * Extend the implementation note after Table 5.149 to explain how to > > interpret the "Extended-linear" mode. > > > > * When Address Mode is 1 'Extended-Linear' it indicates that the > > associated address range (SRAT.MemoryAffinityStructure.Length) is > > comprised of the backing store capacity extended by the cache > > capacity. It is arranged such that there are N directly addressable > > aliases of a given cacheline where N is the ratio of target memory > > proximity domain size and the memory side cache size. Where the N > > aliased addresses for a given cacheline all share the same result > > for the operation 'address modulo cache size'. > > Probably need more here. What if someone has two such ranges of size > > Address 0, (512G + 64G) , (1024G + 128G) > And decides to pack them for some reason. > The second one will be aligned to 64G not, 128G so modulo needs to take > into account the base address. Ignore this one. The maths works fine as is. More coffee needed. > > Do we need explicit statement that N is an integer? Probably works anyway > but having 2.5 aliases is an unusual concept. > > > This setting is only > > allowed when 'Cache Associativity' is 'Direct Map'." > > Other than these corner cases looks good to me and the new terminology and > clarifications help a lot. > > Thanks, > > Jonathan > > >