On Wed, Sep 27, 2017 at 10:15:10AM -0600, Ross Zwisler wrote: > Well, I don't know if platforms that support HMAT + PMEM are widely available, > but we have all the details in the ACPI spec, so we could begin to code it up > and things will "just work" when platforms arrive. Then again currently all actually shipping NVDIMMs are battery backed dram and DAX mode should work just fine for them. Things will get interesting once companies start shipping actually persistent technologies that will be significantly slower than DRAM. And we sould make sure we have the infrastruture for that in place. > Hum, I wonder if maybe we need/want three different mount modes? What about: > > autodax (the default): the filesystem is free to use DAX or not, as it sees > fit and thinks is optimal. For the time being we can make this mean "don't > use DAX", and phase in DAX usage as we add support for the HMAT, etc. What does "use DAX" really mean anyway? I think we are conflating a few things: a) use a block device or use a dax_device for accessing the device b) use the pagecache for caching data in DRAM or not. Now we actually have a really nice way to control a) already, it's called O_DIRECT. Currently O_DIRECT only works with read/write I/O, but with a byte addressable scheme we now can implement it for mmap as well, which is what the DAX mmap path does. b) right now is implied by a), but it's really an implementation detail. So the modes would be more like two options to: a) disallow any byte-level access. The right way to do that would be to mount the /dev/dax* device instead of the block device to allow byte access, and disallow any DAXish operation if you mount the block device in the long run. b) have a mode to always force an O_DIRECT-like mode for devices that are fast enough. We should always do that with the right HMAT entries if mounting the /dev/dax devices, and maybe have a mount option to force it.