On Mon, Sep 28, 2015 at 2:35 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Mon, Sep 28, 2015 at 05:13:50AM -0700, Dan Williams wrote: >> On Sun, Sep 27, 2015 at 5:59 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> > On Fri, Sep 25, 2015 at 09:17:45PM -0600, Ross Zwisler wrote: >> >> On Fri, Sep 25, 2015 at 12:53:57PM +1000, Dave Chinner wrote: >> [..] >> >> Does this sound like a reasonable path forward for v4.3? Dave, and Jan, can >> >> you guys can provide guidance and code reviews for the XFS and ext4 bits? >> > >> > IMO, it's way too much to get into 4.3. I'd much prefer we revert >> > the bad changes in 4.3, and then work towards fixing this for the >> > 4.4 merge window. If someone needs this for 4.3, then they can >> > backport the 4.4 code to 4.3-stable. >> > >> >> If the proposal is to step back and get a running start at these fixes >> for 4.4, then it is worth considering what the state of allocating >> pages for DAX mappings will be in 4.4. > > Oh, do tell. I haven't seen any published design, code, etc, This is via the devm_memremap_pages() api that went into 4.2 [1] and my v1 (RFC quality) series using it for dax get_user_pages() [2]. [1]: https://lkml.org/lkml/2015/8/25/841 [2]: https://lkml.org/lkml/2015/9/23/11 > and I certainly haven't planned any time in the 4.4 window to do a > complete audit, rework and test of the XFS DAX code. So if you want > a working DAX implementation in the short term, we need to fix what > we have and not do wholesale changes to infrastructure that put us > back to square 1. Yes, as Ross educated me, the current split of what is handled in the filesystem vs what is handled in __dax_fault() potentially makes the availability of struct page moot because the locking does not work if initiated from within fs/dax.c... > And, quite frankly, I'm not enabling any new DAX behaviour/subsystem > in XFS until I've had time to review, test and fix it so it works > without deadlocking or corrupting data. I'm in violent agreement, to the point where I'm pondering whether CONFIG_FS_DAX should just depend on CONFIG_BROKEN in 4.3 until we've convinced ourselves of all the fixes in 4.4. It's not clear to me that we have a stable baseline to which we can revert this "still in development" implementation, did you have one in mind? >> It's already that case that >> allocating struct page for DAX mappings is the only solution on the >> horizon for enabling a get_user_pages() solution for persistent >> memory. We of course need to get the page-less DAX path fixed up, but >> the near-term path to full functionality and safety is when struct >> page is available to enable the typical synchronization mechanics. > > And we do so at the expense of medium to long term complexity and > maintenance. I'm no fan of using struct pages to track terabytes to > petabytes of persistent memory, and I'm even less of a fan of having > to simultaneously support both struct page and pfn based DAX > subsystems... I'm no fan of tracking petabytes of persistent memory with struct page, but we're in the near term space (hardware technology-wise) of how to enable DMA/RDMA to 100s of gigabytes to a few terabytes of persistent memory. A page-less solution to that problem is not on the horizon as far as I can tell. In short, I am concerned we are spending time working around the lack of struct page to get to a stable page-less solution that is still missing support for the use cases that are expected to "just work". I do not think introducing page-back persistent memory sets us back to square 1. Instead, given the functionality that is enabled when pages are present I think it is safe to assume most platforms will arrange for page backed persistent memory. If the page-less case is rare to non-existent then we should design for the page-backed case at least until the "petabytes of persistent memory" era arrives. I think we have plenty of time to get page-less right before it is needed, but we have to get over the roadblocks that Christoph and I hit even trying to convert the DMA-API over to be pfn based [3]. [3]: https://lkml.org/lkml/2015/8/12/682 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html