Re: Barlopass nvdimm as MemoryMode question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jane Chu wrote:
> On 3/7/2024 5:42 PM, Jane Chu wrote:
> 
> > On 3/7/2024 4:49 PM, Dan Williams wrote:
> >
> >> Jane Chu wrote:
> >>> Add Joao.
> >>>
> >>> On 3/7/2024 1:05 PM, Dan Williams wrote:
> >>>
> >>>> Jane Chu wrote:
> >>>>> Hi, Dan and Vishal,
> >>>>>
> >>>>> What kind of NUMAness is visible to the kernel w.r.t. SysRAM region
> >>>>> backed by Barlopass nvdimms configured in MemoryMode by impctl ?
> >>>> As always, the NUMA description, is a property of the platform not the
> >>>> media type / DIMM. The ACPI HMAT desrcibes the details of a
> >>>> memory-side-caches. See "5.2.27.2 Memory Side Cache Overview" in ACPI
> >>>> 6.4.
> >>> Thanks!  So, compare to dax_kmem which assign a numa node to a newly
> >>> converted pmem/SysRAM region,
> >> ...to be clear, dax_kmem is not creating a new NUMA node, it is just
> >> potentially onlining a proximity domain that was fully described by ACPI
> >> SRAT but offline.
> >>
> >>> w.r.t. pmem in MemoryMode, is there any clue that kernel exposes(or
> >>> could expose) to userland about the extra latency such that userland
> >>> may treat these memory regions differently?
> >> Userland should be able to interrogate the memory_side_cache/ property
> >> in NUMA sysfs:
> >>
> >> https://docs.kernel.org/admin-guide/mm/numaperf.html?#numa-cache
> >>
> >> Otherwise I believe SRAT and SLIT for that node only reflect the
> >> performance of the DDR fronting the PMEM. So if you have a DDR node and
> >> DDR+PMEM cache node, they may look the same from the ACPI SLIT
> >> perspective, but the ACPI HMAT contains the details of the backing
> >> memory. The Linux NUMA performance sysfs interface gets populated by
> >> ACPI HMAT.
> >
> > Thanks Dan.
> >
> > Please correct me if I'm mistaken:  if I configure some barlowpass 
> > nvdimms to MemoryMode and reboot, as those regions of memory is 
> > automatically two level with DDR as the front cache, so hmat_init() is 
> > expected to create the memory_side_cache/indexN interface, and if I 
> > see multiple indexN layers, that would be a sign that pmem in 
> > MemoryMode is present, right?
> >
> > I've yet to grab hold of a system to confirm this, but apparently with 
> > only DDR memory, memory_side_cache/ doesn't exist.
> 
> On each CPU socket node, we have
> 
> | |-memory_side_cache | | |-uevent | | |-power | | |-index1 | | | 
> |-uevent | | | |-power | | | |-line_size | | | |-write_policy | | | 
> |-size | | | |-indexing
> 
> where 'indexing' = 0, means direct-mapped cache?, so is that a clue that 
> slower/far-memory is behind the cache?

Correct.

Note that the ACPI HMAT may also populate data about the performance of
the memory range on a cache miss (see ACPI 6.4 Table 5.129: System
Locality Latency and Bandwidth Information Structure), but the Linux
enabling does not export that information.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux