Re: [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jonathan Cameron wrote:
> On Fri, 24 May 2024 12:05:28 -0700
> Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> 
> > # Title: "Extended-linear" addressing for direct-mapped memory-side caches
> > 
> > # Status: v3
> > 
> > # Document: ACPI Specification 6.6
> > 
> > # License
> > SPDX-License Identifier: CC-BY-4.0
> > 
> > # Submitter:
> > * Sponsor: Dan Williams, Intel
> > * Creators/Contributors:
> >     * Andy Rudoff, retired
> >     * Mahesh Natu, Intel
> >     * Ishwar Agarwal, Intel
> > 
> > # Changelog
> > * v3: Replace "Inclusive Linear" with "Extended-linear" term, and
> >   clarify the SPA vs HPA behavior of this cache addressing mode.
> >   (Jonathan Cameron)
> > * v2: Clarify the "Inclusive" term as "including the capacity of the cache
> >   in the SRAT range length"
> > * v2: Clarify that 0 is an undeclared / transparent Address Mode, and
> >   that Address Mode values other than 1 are Reserved.
> > 
> > # Summary of the Change
> > Recall that one of the modes available with persistent memory (PMEM) was a
> > direct-mapped memory-side cache where DDR-memory transparently cached
> > PMEM. This article has more details:
> > 
> > https://thessdguy.com/intels-optane-two-confusing-modes-part-2-memory-mode/
> > 
> > ...but the main takeaway of that article that is relevant for this ECN
> > is:
> > 
> >     "[PMEM] is paired with a DRAM that behaves as a cache, and,
> >      like a cache, it is invisible to the user. [..] A typical system
> >      might combine a 64GB DRAM DIMM with a 512GB Optane DIMM, but the
> >      total memory size will appear to the software as only 512GB."
> > 
> > Instead, this new "extended-linear" direct-mapped memory-side cache
> > addressing mode would make the memory-size that appears to software in
> > the above example as 576GB. The inclusion of the DDR capacity to extend
> > the capacity visible to software may improve cache utilization.
> 
> I'd skip the cache utilization point as even with 'may' it might just
> end up rat holing!. Capacity seems enough a justification to me and
> requires a lot less justification.

Sure.

> Perhaps something like
> "The inclusion of the DDR increases the available capacity whilst still
>  providing benefits of a lower latency cache."

Per above will just keep it dryly associated with capacity and not make
any performance claims.

> Up to you though as I'll not have to explain that utilization point
> to anyone whereas you might.
> 
> > 
> > A primary motiviation for updating HMAT to explicitly enumerate this
> > addressing mode is due to the OSPM's increased role for RAS and
> > address-translation with CXL topologies. With CXL and OS native RAS
> > flows OSPM is responsible for understanding and navigating the
> > relationship between System-Physical-Address (SPA) ranges published
> > ACPI.SRAT.MemoryAffinity, Host-Physical-Address ranges (HPA) published
> > in the ACPI.CEDT.CFMWS, and HPAs programmed in CXL memory expander
> > endpoints.
> > 
> > Enable an OSPM to enumerate that the capacity for a memory-side cache
> > extends an SRAT range. Typically the "Memory Side Cache Size" enumerated
> > in the HMAT is "excluded" from the SRAT range length because it is a
> > transparent cache of the SRAT capacity. The enumeration of this
> > addressing mode enables OSPM-memory-RAS (Reliability, Availability, and
> > Serviceability) flows.
> > 
> > # Benefits of the Change
> > Without this change an OSPM that encounters a memory-side cache
> > configuration of DDR fronting CXL may not understand that an SRAT range
> > extended by cache capacity should be maintained as one contiguous SPA
> > range even though the CXL HPA decode configuration only maps a subset of
> > the SRAT SPA range. In other words the memory-side-cache dynamically
> > maps access to that SPA range to either a CXL or DDR HPA.
> > 
> > When the OSPM knows about this relationship it can take actions like
> > quarantine / offline all the impacted aliased pages to prevent further
> > consumption of poison, or run repair operations on all the affected
> > targets. Without this change an OSPM may not accurately identify the HPA
> > associated with a given CXL endpoint DPA event, or it may misunderstand
> > the SPAs that map to CXL HPAs.
> 
> I'd like something here on impacts on firmware first error reporting.
> Given we'd like that to work on a non CXL aware system not aware of this
> feature at all, I'd propose multiple CPER records, one for each alias.
> That assumes the firmware has no path to establish the alias.
> 
> Can certainly conceive of ways to implement a probe-type setup to allow
> the discovery of which alias has been poisoned etc.
> 
> Perhaps needs a note somewhere in 18.3.  Something along lines of
> "For any error with SPA originating in a range, where a memory-side cache
>  with address mode extended-linear is present, multiple error records
>  should be presented to cover any potentially affected aliases."
> 
> Maybe an OS could opt out of that multiple reporting via _OSC or similar
> but I'm not sure why it would bother though. Easier to just allow for
> multiple events.

Makes sense to add a note about the "multiple CPER record" expectation.
Effectively this ECN is about allowing native-error-handling to do the
same.


> > # Impact of the Change
> > The proposed "Address Mode" field consumes the 2 Reserved bytes
> > following the "Cache Attributes" field in the "Memory Side Cache
> > Information Structure". The default reserved value of 0 indicates the
> > status quo of an undeclared addressing mode where the expectation is
> > that it is safe to assume the cache-capacity is transparent to the SRAT
> > range capacity. An OSPM that knows about new values can consider SPA to
> > HPA relationships according to the address-layout definition proposed
> > below. A legacy OSPM will ignore it as a Reserved field.
> > 
> > # References
> > * Compute Express Link Specification v3.1,
> > <https://www.computeexpresslink.org/>
> > 
> > # Detailed Description of the Change
> 
> Probably need to up rev HMAT as well.

I'd let the ACPI working group make that determination. I am not clear
on whether repurposing a reserved field mandates a version bump.

> > 
> > * Section Table 5.149: Memory Side Cache Information Structure redefine
> >   the 2 Reserved bytes starting at offset 28 as "Address Mode":
> > 
> >     * 0 - Reserved (OSPM may assume transparent cache addressing)
> 
> Can we make that assumption?  What are today's firmware's doing for this?

The only shipping example I know of was for PMEM.

> I'd drop the 'may assume'  Also after this change it's not reserved.
> 0 explicitly means transparent cache addressing.

I am just going to switch the parenthetical to "(Unknown Address Mode)"
because "transparent" does not give any actionable information about
alias layout in the SRAT address space. So system-software can make no
assumptions about layout without consulting implementation specific
documentation.

> >     * 1 - Extended-linear (N direct-map aliases linearly mapped)
> >     * 2..65535 - Reserved (Unknown Address Mode)
> > 
> > * Extend the implementation note after Table 5.149 to explain how to
> >   interpret the "Extended-linear" mode.
> > 
> >   * When Address Mode is 1 'Extended-Linear' it indicates that the
> >     associated address range (SRAT.MemoryAffinityStructure.Length) is
> >     comprised of the backing store capacity extended by the cache
> >     capacity. It is arranged such that there are N directly addressable
> >     aliases of a given cacheline where N is the ratio of target memory
> >     proximity domain size and the memory side cache size. Where the N
> >     aliased addresses for a given cacheline all share the same result
> >     for the operation 'address modulo cache size'.
> 
> Probably need more here.  What if someone has two such ranges of size
> 
> Address 0, (512G + 64G) , (1024G + 128G)
> And decides to pack them for some reason.
> The second one will be aligned to 64G not, 128G so modulo needs to take
> into account the base address.

Decides to pack them how? My expectation in this situation is 2
proximity domains / memory-side cache descriptions.

> Do we need explicit statement that N is an integer? Probably works anyway
> but having 2.5 aliases is an unusual concept.

Easy enough to add "(integer)" after the first reference of "N".

> > This setting is only
> >     allowed when 'Cache Associativity' is 'Direct Map'."
> 
> Other than these corner cases looks good to me and the new terminology and
> clarifications help a lot.

Thanks for the feedback.




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux