On Tue, Feb 23, 2016 at 04:10:50PM +0200, Boaz Harrosh wrote: > On 02/23/2016 11:52 AM, Christoph Hellwig wrote: > <> > > > > And this is BS. Using msync or fsync might not perform as well as not > > actually using them, but without them you do not get persistence. If > > you use your pmem as a throw away cache that's fine, but for most people > > that is not the case. > > > > Hi Christoph > > So is exactly my suggestion. My approach is *not* the we do not call > m/fsync to let the FS clean up. > > In my model we still do that, only we eliminate the m/fsync slowness > and the all page faults overhead by being instructed by the application > that we do not need to track the data modified cachelines. Since the > application is telling us that it will do so. > > In my model the job is split: > App will take care of data persistence by instructing a MAP_PMEM_AWARE, > and doing its own cl_flushing / movnt. > Which is the heavy cost > > The FS will keep track of the Meta-Data persistence as it already does, via the > call to m/fsync. Which is marginal performance compared to the above heavy > IO. > > Note that the FS is still free to move blocks around, as Dave said: > lockout pagefaultes, unmap from user space, let app fault again on a new > block. this will still work as before, already in COW we flush the old > block so there will be no persistence lost. > > So this all thread started with my patches, and my patches do not say > "no m/fsync" they say, make this 3-8 times faster than today if the app > is participating in the heavy lifting. > > Please tell me what you find wrong with my approach? It seems like we are trying to solve a couple of different problems: 1) Make page faults faster by skipping any radix tree insertions, tag updates, etc. 2) Make fsync/msync faster by not flushing data that the application says it is already making durable from userspace. I agree that your approach seems to improve both of these problems, but I would argue that it is an incomplete solution for problem #2 because a fsync/msync from the PMEM aware application would still flush any radix tree entries from *other* threads that were writing to the same file. It seems like a more direct solution for #2 above would be to have a metadata-only equivalent of fsync/fdatasync, say "fmetasync", which says "I'll make the writes I do to my mmaps durable from userspace, but I need you to sync all filesystem metadata for me, please". This would allow a complete separation of data synchronization in userspace from metadata synchronization in kernel space by the filesystem code. By itself a fmetasync() type solution of course would do nothing for issue #1 - if that was a compelling issue you'd need something like the mmap tag you're proposing to skip work on page faults. All that being said, though, I agree with others in the thread that we should still be focused on correctness, as we have a lot of correctness issues remaining. When we eventually get to the place where we are trying to do performance optimizations, those optimizations should be measurement driven. - Ross -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>