RE: [RFC] Kernel Support of Memory Error Detection.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - General]

Please include Yazen from AMD on this discussion.

Making the patrol scrubber accessible to the OS would very likely not work without other changes. It is possible (even likely) that other entities in the system are manipulating the patrol scrubber, and there's no way to resolve any conflicts or race conditions.

So, if this was exposed to ACPI, it would need to be exposed through a capability and that capability would only be supported if the processors added support for OS-dedicated patrol scrubber hardware, or if a specific product could guarantee no other entities are using the patrol scrubber.

     -Vilas

-----Original Message-----
From: Jiaqi Yan <jiaqiyan@xxxxxxxxxx> 
Sent: Thursday, November 17, 2022 8:20 PM
To: Sridharan, Vilas <Vilas.Sridharan@xxxxxxx>; Malvestuto, Mike <mike.malvestuto@xxxxxxxxx>
Cc: HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@xxxxxxx>; Nadav Amit <nadav.amit@xxxxxxxxx>; David Hildenbrand <david@xxxxxxxxxx>; Aktas, Erdem <erdemaktas@xxxxxxxxxx>; pgonda@xxxxxxxxxx; rientjes@xxxxxxxxxx; Hsiao, Duen-wen <duenwen@xxxxxxxxxx>; gthelen@xxxxxxxxxx; linux-mm@xxxxxxxxx; jthoughton@xxxxxxxxxx; dave.hansen@xxxxxxxxxxxxxxx; Luck, Tony <tony.luck@xxxxxxxxx>
Subject: Re: [RFC] Kernel Support of Memory Error Detection.

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Tue, Nov 8, 2022 at 9:04 PM HORIGUCHI NAOYA(堀口 直也)
<naoya.horiguchi@xxxxxxx> wrote:
>
> On Tue, Nov 08, 2022 at 04:17:06PM +0000, Luck, Tony wrote:
> > > If it is feasible in future that hardware vendors can make patrol 
> > > scrubber programmable, we can even direct the scanning to patrol 
> > > scrubber.
> >
> > There was an attempt to create an ACPI interface for this. I don't 
> > know if it made it into the standard.
>
> I briefly checked the latest ACPI spec, and it seems that some 
> interfaces to control (h/w based) patrol scrubbing are defined.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuefi
> .org%2Fspecs%2FACPI%2F6.5%2F05_ACPI_Software_Programming_Model.html%23
> acpi-ras-feature-table-rasf&amp;data=05%7C01%7Cvilas.sridharan%40amd.c
> om%7C757b6941a0a7432c826408dac903006a%7C3dd8961fe4884e608e11a82d994e18
> 3d%7C0%7C0%7C638043311988593656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&a
> mp;sdata=7%2B4WJc9wS%2B21TAgLw3E1P8qNwSs8V9LFkbDAGU8kgyE%3D&amp;reserv
> ed=0

A followup question to Intel and AMD RAS folks (Mike and Vilas), what is your position on the ACPI interface to control hw patrol scrubber, and further make it programmable by kernel? Is this something you are willing to consider?

>
> > I didn't do anything with it for Linux because the interface was 
> > quite complex.
> >
> > From a h/w perspective it might always be complex. Consecutive 
> > system physical addresses are generally interleaved across multiple 
> > memory controllers, channels, DIMMs and ranks. While patrol 
> > scrubbing may be done by each memory controller at the channel level.
> >
> > So a simple request to scan a few megabytes of system physical 
> > address would require address translation to figure out the channel 
> > addresses on each of the memory controllers and programming each to 
> > scan the pieces they contribute to the target range.
>
> I expect that the physical address visible to the kernel is 
> transparently translated to the real address in which DIMM in which channel.
>
> - Naoya Horiguchi





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux