On Thu, Dec 10, 2015 at 11:20 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote: > Dan Williams <dan.j.williams@xxxxxxxxx> writes: > >> On Thu, Dec 10, 2015 at 10:08 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote: >>> Dan Williams <dan.j.williams@xxxxxxxxx> writes: >>> >>>> Summary: >>>> >>>> To date, we have implemented two I/O usage models for persistent memory, >>>> PMEM (a persistent "ram disk") and DAX (mmap persistent memory into >>>> userspace). This series adds a third, DAX-GUP, that allows DAX mappings >>>> to be the target of direct-i/o. It allows userspace to coordinate >>>> DMA/RDMA from/to persistent memory. >>>> >>>> The implementation leverages the ZONE_DEVICE mm-zone that went into >>>> 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned >>>> and dynamically mapped by a device driver. The pmem driver, after >>>> mapping a persistent memory range into the system memmap via >>>> devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus >>>> page-backed pmem-pfns via flags in the new pfn_t type. >>> >>> So, this basically means that an admin has to decide whether or not DMA >>> will be used on a given device before making a file system on it. That >>> seems like an odd requirement. There's also a configuration option of >>> whether to put those backing struct pages into DRAM or PMEM (which, of >>> course, will be dictated by the size of pmem). I really think we should >>> reconsider this approach. >>> >>> First, the admin shouldn't have to choose whether or not DMA will be >>> done on the file system. >> >> To be clear it's not "whether or not DMA will be done on the file >> system", it's whether or not both DMA and DAX will be done >> simultaneously on the filesystem. > > Fair point, but I'd view one of those configurations as not recommended. > To be clear, if you're just going to use the device for block based > access, using btt is the safer option. Speaking of btt, the mechanism for setting up a btt is identical to specifying a reserved area for the memmap. I.e. write an info block to the namespace to specify a new mode of operation. >> DAX is already a capability that an admin can inadvertently disable by >> mis-configuring the alignment of a partition [1]. > > Heh, using my own commit against me? ;-) Anyway, the commit message > suggests that dax *could* be supported on misaligned partitions. All's fair in love, war, and code defense. :-) >> Why not also disable it when DMA support is not configured and force >> the fs back to page-cache? Namespace creation tooling in userspace >> can default to enabling DAX + DMA. > > Well, the only reason I can come up with is manufactured: we've forced > the admin to decide between having that extra space for storage and > doing DMA, and he or she opted for more space. Is this any worse than the "forcing" we're imposing in the btt / no-btt decision that impacts DAX? This additional configuration flexibility for whether / where to store a memmap array is merely incremental, not fatal. It's also a configuration decision we can stop asking an admin to make when / if we ever re-write the kernel to reduce its dependency on struct page. In the meantime, I expect some would say DAX is a toy as long as it continues to fail at DMA. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>