Jane Chu wrote: > On 3/7/2024 5:42 PM, Jane Chu wrote: > > > On 3/7/2024 4:49 PM, Dan Williams wrote: > > > >> Jane Chu wrote: > >>> Add Joao. > >>> > >>> On 3/7/2024 1:05 PM, Dan Williams wrote: > >>> > >>>> Jane Chu wrote: > >>>>> Hi, Dan and Vishal, > >>>>> > >>>>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region > >>>>> backed by Barlopass nvdimms configured in MemoryMode by impctl ? > >>>> As always, the NUMA description, is a property of the platform not the > >>>> media type / DIMM. The ACPI HMAT desrcibes the details of a > >>>> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI > >>>> 6.4. > >>> Thanks! So, compare to dax_kmem which assign a numa node to a newly > >>> converted pmem/SysRAM region, > >> ...to be clear, dax_kmem is not creating a new NUMA node, it is just > >> potentially onlining a proximity domain that was fully described by ACPI > >> SRAT but offline. > >> > >>> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or > >>> could expose) to userland about the extra latency such that userland > >>> may treat these memory regions differently? > >> Userland should be able to interrogate the memory_side_cache/ property > >> in NUMA sysfs: > >> > >> https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache > >> > >> Otherwise I believe SRAT and SLIT for that node only reflect the > >> performance of the DDR fronting the PMEM. So if you have a DDR node and > >> DDR+PMEM cache node, they may look the same from the ACPI SLIT > >> perspective, but the ACPI HMAT contains the details of the backing > >> memory. The Linux NUMA performance sysfs interface gets populated by > >> ACPI HMAT. > > > > Thanks Dan. > > > > Please correct me if I'm mistaken: if I configure some barlowpass > > nvdimms to MemoryMode and reboot, as those regions of memory is > > automatically two level with DDR as the front cache, so hmat_init() is > > expected to create the memory_side_cache/indexN interface, and if I > > see multiple indexN layers, that would be a sign that pmem in > > MemoryMode is present, right? > > > > I've yet to grab hold of a system to confirm this, but apparently with > > only DDR memory, memory_side_cache/ doesn't exist. > > On each CPU socket node, we have > > | |-memory_side_cache | | |-uevent | | |-power | | |-index1 | | | > |-uevent | | | |-power | | | |-line_size | | | |-write_policy | | | > |-size | | | |-indexing > > where 'indexing' = 0, means direct-mapped cache?, so is that a clue that > slower/far-memory is behind the cache? Correct. Note that the ACPI HMAT may also populate data about the performance of the memory range on a cache miss (see ACPI 6.4 Table 5.129: System Locality Latency and Bandwidth Information Structure), but the Linux enabling does not export that information.