With the introduction of byte addressable storage devices that have low latencies, it becomes difficult to decide how to expose these devices to user space applications. Do we treat them as traditional block devices or expose them as a DAX capable device? A traditional block device allows us to use the page cache to take advantage of locality in access patterns, but comes at the expense of extra memory copies that are extremely costly for random workloads. A DAX capable device seems great for the aforementioned random access workload, but suffers once there is some locality in the access pattern. When DAX-capable devices are used as slower/cheaper volatile memory, treating them as a slower NUMA node with an associated NUMA migration policy would allow for taking advantage of access pattern locality. However this approach suffers from a few drawbacks. First, when those devices are also persistent, the tiering approach used in NUMA migration may not guarantee persistence. Secondly, for devices with significantly higher latencies than DRAM, the cost of moving clean pages may be significant. Finally, pages handled via NUMA migration are a common resource subject to thrashing in case of memory pressure. I would like to discuss an alternative approach where memory intensive applications mmap these storage devices into their address space. The application can specify how much DRAM could be used as a cache and have some influence on prefetching and eviction policies. The goal of such an approach would be to minimize the impact of the slightly slower memory could potentially have on a system when it is treated as kernel managed global resource, as well as enable use of those devices as persistent memory. BTW we criminally ;) used the vm_insert_page function in a prototype and have found that it is faster to use vs page cache and swapping mechanisms limited to use a small amount of DRAM.