Re: [ACPI Code First ECN] "Extended-linear" addressing for direct-mapped memory-side caches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 20 Jun 2024 11:51:09 -0700
Dan Williams <dan.j.williams@xxxxxxxxx> wrote:

> Jonathan Cameron wrote:
> [..]
> > > > I'd drop the 'may assume'  Also after this change it's not reserved.
> > > > 0 explicitly means transparent cache addressing.    
> > > 
> > > I am just going to switch the parenthetical to "(Unknown Address Mode)"
> > > because "transparent" does not give any actionable information about
> > > alias layout in the SRAT address space. So system-software can make no
> > > assumptions about layout without consulting implementation specific
> > > documentation.  
> > 
> > I'd like an option to indicate that we know reported errors will not
> > involve problems with aliases. Something like...
> > 
> > 0 - Unknown (all bets are off, read the manual).
> > 1 - No aliases.
> > 2 - your one.
> > 
> > A simple write-through or write-back cache would not result in aliases
> > for errors reported by the backing memory.  
> 
> This seems a separate proposal, and needs more discussion because there
> *are* aliases. While there is no HPA aliasing, there is a FRU
> (field-replaceable-unit) aliasing. So if system-software wants to
> determine what indicators to fire (i.e. replace cache-mem, replace
> backing-mem, or both) to the tech servicing the node it needs some ACPI
> help.

There is  a case for FW first CPER etc (or a side cache specific driver,
ideally binding to a suitable ACPIXXXX but that's a different ECN :))
having to identify errors coming from a memory-side cache but I don't
see it as an issue that sits in this place in the spec (or even this spec).

For the CXL case, the event record tells you enough info on where poison
originated to rule out or in the CXL device as the problem.  There is
a gap I think in errors for memory-side cache and agreed that's a
different ECN.

> 
> I would be ok to do:
> 
>  0 - Unknown (all bets are off, read the manual).
>  1 - Reserved
>  2 - Extended linear
> 
> ...just to try to keep the list ordered by complexity for now.
> 
> However, I am also worried about the case where folks want to do "noisy
> neighbor mitigation", which is something that has been attempted with
> PMEM caches. This involves knowing the layout of cache conflicts which
> need not be linear and involves reading the manual. So, I am not sure
> defining a "no aliases" indicator now improves the Extended Linear
> proposal, or is an improvement upon "read the manual".


It tells you if you are trying to do poison repair you only need to write
one 'cacheline etc' from the host, not several.   I wouldn't attempt
to take it any further than that due the sort of trickery you mention.


> 
> > Assuming we don't get an address corruption (in which case everything
> > dead anyway as uncontainable error), then poison can come from:
> > 1) poison happens in the memory itself (fine, the DPA in CXL is enough)
> > 2) poison happens in cache and is written back to memory. (fine
> >    the DPA in CXL is enough).
> > 3) poison happens in cache and is read by host. Synchronous handling and
> >    the HPA is available and enough.
> > 
> > Not much we can do with 0, but 1 at least lets us know we have the
> > single right answer.  
> 
> That is, assuming that this is caching CXL. With CXL, the DPA
> information is available to disambiguate the source of the poison, but
> for memory-side-caches that are not backed by CXL, what does
> system-software do with that "1" case?

If it got an HPA it does an arch specific poison clear on the HPA address
or isolates the page with that single address. If it didn't you have
no useful info - wait for synchronous poison.

Jonathan






[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux