Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous memory system

Dan Williams <dan.j.williams@xxxxxxxxx> · Thu, 25 Apr 2019 08:43:08 -0700

On Thu, Apr 25, 2019 at 1:05 AM Du, Fan <fan.du@xxxxxxxxx> wrote:
>
>
>
> >-----Original Message-----
> >From: owner-linux-mm@xxxxxxxxx [mailto:owner-linux-mm@xxxxxxxxx] On
> >Behalf Of Michal Hocko
> >Sent: Thursday, April 25, 2019 3:54 PM
> >To: Du, Fan <fan.du@xxxxxxxxx>
> >Cc: akpm@xxxxxxxxxxxxxxxxxxxx; Wu, Fengguang <fengguang.wu@xxxxxxxxx>;
> >Williams, Dan J <dan.j.williams@xxxxxxxxx>; Hansen, Dave
> ><dave.hansen@xxxxxxxxx>; xishi.qiuxishi@xxxxxxxxxxxxxxx; Huang, Ying
> ><ying.huang@xxxxxxxxx>; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> >Subject: Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous
> >memory system
> >
> >On Thu 25-04-19 07:41:40, Du, Fan wrote:
> >>
> >>
> >> >-----Original Message-----
> >> >From: Michal Hocko [mailto:mhocko@xxxxxxxxxx]
> >> >Sent: Thursday, April 25, 2019 2:37 PM
> >> >To: Du, Fan <fan.du@xxxxxxxxx>
> >> >Cc: akpm@xxxxxxxxxxxxxxxxxxxx; Wu, Fengguang
> ><fengguang.wu@xxxxxxxxx>;
> >> >Williams, Dan J <dan.j.williams@xxxxxxxxx>; Hansen, Dave
> >> ><dave.hansen@xxxxxxxxx>; xishi.qiuxishi@xxxxxxxxxxxxxxx; Huang, Ying
> >> ><ying.huang@xxxxxxxxx>; linux-mm@xxxxxxxxx;
> >linux-kernel@xxxxxxxxxxxxxxx
> >> >Subject: Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous
> >> >memory system
> >> >
> >> >On Thu 25-04-19 09:21:30, Fan Du wrote:
> >> >[...]
> >> >> However PMEM has different characteristics from DRAM,
> >> >> the more reasonable or desirable fallback style would be:
> >> >> DRAM node 0 -> DRAM node 1 -> PMEM node 2 -> PMEM node 3.
> >> >> When DRAM is exhausted, try PMEM then.
> >> >
> >> >Why and who does care? NUMA is fundamentally about memory nodes
> >with
> >> >different access characteristics so why is PMEM any special?
> >>
> >> Michal, thanks for your comments!
> >>
> >> The "different" lies in the local or remote access, usually the underlying
> >> memory is the same type, i.e. DRAM.
> >>
> >> By "special", PMEM is usually in gigantic capacity than DRAM per dimm,
> >> while with different read/write access latency than DRAM.
> >
> >You are describing a NUMA in general here. Yes access to different NUMA
> >nodes has a different read/write latency. But that doesn't make PMEM
> >really special from a regular DRAM.
>
> Not the numa distance b/w cpu and PMEM node make PMEM different than
> DRAM. The difference lies in the physical layer. The access latency characteristics
> comes from media level.

No, there is no such thing as a "PMEM node". I've pushed back on this
broken concept in the past [1] [2]. Consider that PMEM could be as
fast as DRAM for technologies like NVDIMM-N or in emulation
environments. These attempts to look at persistence as an attribute of
performance are entirely missing the point that the system can have
multiple varied memory types and the platform firmware needs to
enumerate these performance properties in the HMAT on ACPI platforms.
Any scheme that only considers a binary DRAM and not-DRAM property is
immediately invalidated the moment the OS needs to consider a 3rd or
4th memory type, or a more varied connection topology.

[1]: https://lore.kernel.org/lkml/CAPcyv4heiUbZvP7Ewoy-Hy=-mPrdjCjEuSw+0rwdOUHdjwetxg@xxxxxxxxxxxxxx/

[2]: https://lore.kernel.org/lkml/CAPcyv4it1w7SdDVBV24cRCVHtLb3s1pVB5+SDM02Uw4RbahKiA@xxxxxxxxxxxxxx/