Re: [LSF/MM TOPIC] Page Cache Flexibility for NVM

Adam Manzanares <Adam.Manzanares@xxxxxxx> · Thu, 21 Feb 2019 23:49:45 +0000

Forgot the link.

[1] https://github.com/westerndigitalcorporation/hmmap

Take care,
Adam

On Thu, 2019-02-21 at 15:11 -0800, Adam Manzanares wrote:
> Hello,
> 
> I would like to attend the LSF/MM Summit 2019. I'm interested in
> several MM topics that are mentioned below as well as Zoned Block
> Devices and any io determinism topics that come up in the storage
> track. 
> 
> I have been working on a caching layer, hmmap (heterogeneous memory
> map) [1], for emerging NVM and it is in spirit close to the page
> cache. The key difference being that the backend device and caching
> layer of hmmap is pluggable. In addition, hmmap supports DAX and
> write
> protection, which I believe are key features for emerging NVMs that
> may
> have write/read asymmetry as well as write endurance constraints.
> Lastly we can leverage hardware, such as a DMA engine, when moving
> pages between the cache while also allowing direct access if the
> device
> is capable.
> 
> I am proposing that as an alternative to using NVMs as a NUMA node
> we expose the NVM through the page cache or a viable alternative and
> have userspace applications mmap the NVM and hand out memory with
> their favorite userspace memory allocator.
> 
> This would isolate the NVMs to only applications that are well aware
> of the performance implications of accessing NVM. I believe that all
> of this work could be solved with the NUMA node approach, but the two
> approaches are seeming to blur together.
> 
> The main points I would like to discuss are:
> 
> * Is the page cache model a viable alternative to NVM as a NUMA NODE?
> * Can we add more flexibility to the page cache?
> * Should we force separation of NVM through an explicit mmap?
> 
> I believe this discussion could be merged with NUMA, memory hierarchy
> and device memory, Use NVDIMM as NUMA node and NUMA API, or memory
> reclaim with NUMA balancing.
> 
> Here are some performance numbers of hmmap (in development):
> 
> All numbers are collected on a 4GiB hmmap device with a 128MiB cache.
> For the mmap tests I used cgroups to limit the page cache usage to
> 128MiB. All results are an average of 10 runs. W and R access the
> entire device with all threads segregated in the address space. RR
> reads the entire device randomly 8 bytes at a time and is limited to
> 8MiB of data accessed.
> 
> hmmap brd vs. mmap of brd
> 
> 	hmmap			mmap			
> 
> Threads W     R     RR 	  W 	R     RR 
> 
> 1  	7.21  5.39  5.04  6.80  5.63  5.23	
> 2	5.19  3.87  3.74  4.66  3.33  3.20
> 4	3.65  2.95  3.07  3.53  2.26  2.18
> 8	4.52  3.43  3.59  4.30  1.98  1.88
> 16	5.00  3.85  3.98  4.92  2.00  1.99
> 
> 
> 
> Memory Backend Test (Dax capable)
> 
> 	hmmap             hmmap-dax         hmmap-wrprotect
> 
> Threads	W     R     RR    W     R     RR    W     R     RR 
> 
> 1      	6.29  4.94  4.37  2.54  1.36  0.16  7.12  2.13  0.73 
> 2	4.62  3.63  3.57  1.41  0.69  0.08  5.06  1.14  0.41
> 4	3.45  2.97  3.11  0.77  0.36  0.04  3.66  0.63  0.25
> 8	4.10  3.53  3.71  0.44  0.19  0.02  4.03  0.35  0.17
> 16	4.60  3.98  4.04  0.34  0.16  0.02  4.52  0.27  0.14
> 
> 
> Thanks,
> Adam
> 
> 
> 
>