On 09/02/2015 06:19 AM, Ross Zwisler wrote: > On Wed, Sep 02, 2015 at 08:21:20AM +1000, Dave Chinner wrote: >> Which means applications that should "just work" without >> modification on DAX are now subtly broken and don't actually >> guarantee data is safe after a crash. That's a pretty nasty >> landmine, and goes against *everything* we've claimed about using >> DAX with existing applications. >> >> That's wrong, and needs fixing. > > I agree that we need to fix fsync as well, and that the fsync solution could > be used to implement msync if we choose to go that route. I think we might > want to consider keeping the msync and fsync implementations separate, though, > for two reasons. > > 1) The current msync implementation is much more efficient than what will be > needed for fsync. Fsync will need to call into the filesystem, traverse all > the blocks, get kernel virtual addresses from those and then call > wb_cache_pmem() on those kernel addresses. I was thinking about this some more, and no this is not what we need to do because of the virtual-based-cache ARCHs. And what we do for these systems will also work for physical-based-cache ARCHs. What we need to do, is dig into the mapping structure and pic up the current VMA on the call to fsync. Then just flush that one on that virtual address, (since it is current at the context of the fsync sys call) And of course we need to do like I wrote, we must call fsync on vm_operations->close before the VMA mappings goes away. Then an fsync after unmap is a no-op. > I think this is a necessary evil > for fsync since you don't have a VMA, but for msync we do and we can just > flush using the user addresses without any fs lookups. > right see above > 2) I believe that the near-term fsync code will rely on struct pages for > PMEM, which I believe are possible but optional as of Dan's last patch set: > > https://lkml.org/lkml/2015/8/25/841 > > I believe that this means that if we don't have struct pages for PMEM (becuase > ZONE_DEVICE et al. are turned off) fsync won't work. I'd be nice not to lose > msync as well. Please see above it can be made to work. Actually what we do is the traversal-kernel-ptr thing, and the fsync-on-unmap. And it works we have heavy persistence testing and it is all very good. So no, without pages it can all work very-well. There is only the sync problem that I intend to fix soon, is only a matter of keeping a dax-dirty inode-list per sb. So no this is not an excuse. Cheers Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html