Re: [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 24 May 2024 12:05:28 -0700
Dan Williams <dan.j.williams@xxxxxxxxx> wrote:

> # Title: "Extended-linear" addressing for direct-mapped memory-side caches
> 
> # Status: v3
> 
> # Document: ACPI Specification 6.6
> 
> # License
> SPDX-License Identifier: CC-BY-4.0
> 
> # Submitter:
> * Sponsor: Dan Williams, Intel
> * Creators/Contributors:
>     * Andy Rudoff, retired
>     * Mahesh Natu, Intel
>     * Ishwar Agarwal, Intel
> 
> # Changelog
> * v3: Replace "Inclusive Linear" with "Extended-linear" term, and
>   clarify the SPA vs HPA behavior of this cache addressing mode.
>   (Jonathan Cameron)
> * v2: Clarify the "Inclusive" term as "including the capacity of the cache
>   in the SRAT range length"
> * v2: Clarify that 0 is an undeclared / transparent Address Mode, and
>   that Address Mode values other than 1 are Reserved.
> 
> # Summary of the Change
> Recall that one of the modes available with persistent memory (PMEM) was a
> direct-mapped memory-side cache where DDR-memory transparently cached
> PMEM. This article has more details:
> 
> https://thessdguy.com/intels-optane-two-confusing-modes-part-2-memory-mode/
> 
> ...but the main takeaway of that article that is relevant for this ECN
> is:
> 
>     "[PMEM] is paired with a DRAM that behaves as a cache, and,
>      like a cache, it is invisible to the user. [..] A typical system
>      might combine a 64GB DRAM DIMM with a 512GB Optane DIMM, but the
>      total memory size will appear to the software as only 512GB."
> 
> Instead, this new "extended-linear" direct-mapped memory-side cache
> addressing mode would make the memory-size that appears to software in
> the above example as 576GB. The inclusion of the DDR capacity to extend
> the capacity visible to software may improve cache utilization.

I'd skip the cache utilization point as even with 'may' it might just
end up rat holing!. Capacity seems enough a justification to me and
requires a lot less justification.

Perhaps something like
"The inclusion of the DDR increases the available capacity whilst still
 providing benefits of a lower latency cache."

Up to you though as I'll not have to explain that utilization point
to anyone whereas you might.

> 
> A primary motiviation for updating HMAT to explicitly enumerate this
> addressing mode is due to the OSPM's increased role for RAS and
> address-translation with CXL topologies. With CXL and OS native RAS
> flows OSPM is responsible for understanding and navigating the
> relationship between System-Physical-Address (SPA) ranges published
> ACPI.SRAT.MemoryAffinity, Host-Physical-Address ranges (HPA) published
> in the ACPI.CEDT.CFMWS, and HPAs programmed in CXL memory expander
> endpoints.
> 
> Enable an OSPM to enumerate that the capacity for a memory-side cache
> extends an SRAT range. Typically the "Memory Side Cache Size" enumerated
> in the HMAT is "excluded" from the SRAT range length because it is a
> transparent cache of the SRAT capacity. The enumeration of this
> addressing mode enables OSPM-memory-RAS (Reliability, Availability, and
> Serviceability) flows.
> 
> # Benefits of the Change
> Without this change an OSPM that encounters a memory-side cache
> configuration of DDR fronting CXL may not understand that an SRAT range
> extended by cache capacity should be maintained as one contiguous SPA
> range even though the CXL HPA decode configuration only maps a subset of
> the SRAT SPA range. In other words the memory-side-cache dynamically
> maps access to that SPA range to either a CXL or DDR HPA.
> 
> When the OSPM knows about this relationship it can take actions like
> quarantine / offline all the impacted aliased pages to prevent further
> consumption of poison, or run repair operations on all the affected
> targets. Without this change an OSPM may not accurately identify the HPA
> associated with a given CXL endpoint DPA event, or it may misunderstand
> the SPAs that map to CXL HPAs.

I'd like something here on impacts on firmware first error reporting.
Given we'd like that to work on a non CXL aware system not aware of this
feature at all, I'd propose multiple CPER records, one for each alias.
That assumes the firmware has no path to establish the alias.

Can certainly conceive of ways to implement a probe-type setup to allow
the discovery of which alias has been poisoned etc.

Perhaps needs a note somewhere in 18.3.  Something along lines of
"For any error with SPA originating in a range, where a memory-side cache
 with address mode extended-linear is present, multiple error records
 should be presented to cover any potentially affected aliases."

Maybe an OS could opt out of that multiple reporting via _OSC or similar
but I'm not sure why it would bother though. Easier to just allow for
multiple events.

> 
> # Impact of the Change
> The proposed "Address Mode" field consumes the 2 Reserved bytes
> following the "Cache Attributes" field in the "Memory Side Cache
> Information Structure". The default reserved value of 0 indicates the
> status quo of an undeclared addressing mode where the expectation is
> that it is safe to assume the cache-capacity is transparent to the SRAT
> range capacity. An OSPM that knows about new values can consider SPA to
> HPA relationships according to the address-layout definition proposed
> below. A legacy OSPM will ignore it as a Reserved field.
> 
> # References
> * Compute Express Link Specification v3.1,
> <https://www.computeexpresslink.org/>
> 
> # Detailed Description of the Change

Probably need to up rev HMAT as well.

> 
> * Section Table 5.149: Memory Side Cache Information Structure redefine
>   the 2 Reserved bytes starting at offset 28 as "Address Mode":
> 
>     * 0 - Reserved (OSPM may assume transparent cache addressing)

Can we make that assumption?  What are today's firmware's doing for this?
I'd drop the 'may assume'  Also after this change it's not reserved.
0 explicitly means transparent cache addressing.

>     * 1 - Extended-linear (N direct-map aliases linearly mapped)
>     * 2..65535 - Reserved (Unknown Address Mode)
> 
> * Extend the implementation note after Table 5.149 to explain how to
>   interpret the "Extended-linear" mode.
> 
>   * When Address Mode is 1 'Extended-Linear' it indicates that the
>     associated address range (SRAT.MemoryAffinityStructure.Length) is
>     comprised of the backing store capacity extended by the cache
>     capacity. It is arranged such that there are N directly addressable
>     aliases of a given cacheline where N is the ratio of target memory
>     proximity domain size and the memory side cache size. Where the N
>     aliased addresses for a given cacheline all share the same result
>     for the operation 'address modulo cache size'.

Probably need more here.  What if someone has two such ranges of size

Address 0, (512G + 64G) , (1024G + 128G)
And decides to pack them for some reason.
The second one will be aligned to 64G not, 128G so modulo needs to take
into account the base address.

Do we need explicit statement that N is an integer? Probably works anyway
but having 2.5 aliases is an unusual concept.

> This setting is only
>     allowed when 'Cache Associativity' is 'Direct Map'."

Other than these corner cases looks good to me and the new terminology and
clarifications help a lot.

Thanks,

Jonathan






[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux