On Thu, Feb 04, 2016 at 09:29:57PM +0100, Jan Kara wrote: > On Thu 04-02-16 12:56:19, Ross Zwisler wrote: > > On Wed, Feb 03, 2016 at 11:46:11AM +0100, Jan Kara wrote: <> > > > Let's clear this up a bit: The problem with using ->fsync() method is that > > > it doesn't get called for sync(2). We could use ->sync_fs() to flush caches > > > in case of sync(2) (that's what's happening for normal storage) but the > > > problem with PMEM is that "flush all cached data" operation effectively > > > means iterate through all modified pages and we didn't want to implement > > > this for DAX fsync code. > > > > > > So we have decided to do cache flushing for DAX at a different point - mark > > > inodes which may have writes cached as dirty and use writeback code for the > > > cache flushing. But looking at it now, we have actually chosen a wrong > > > place to do the flushing in the writeback path - note that sync(2) writes > > > data via __writeback_single_inode() -> do_writepages() and thus doesn't > > > even get to filemap_write_and_wait(). > > > > > > So revisiting the decision I see two options: > > > > > > 1) Move the DAX flushing code from filemap_write_and_wait() into > > > ->writepages() fs callback. There the filesystem can provide all the > > > information it needs including bdev, get_block callback, or whatever. > > > > > > 2) Back out even further and implement own tracking and iteration of inodes > > > to write. > > > > > > So far I still think 2) is not worth the complexity (although it would > > > bring DAX code closer to how things behave with standard storage) so I > > > would go for 1). > > > > Jan, just to clarify, are you proposing this change for v4.5 in the remaining > > RCs as an alternative to the get_bdev() patch? > > > > https://lkml.org/lkml/2016/2/2/941 > > Yes, because I don't think anything like ->get_bdev() is needed at all. > Look: dax_do_io(), __dax_fault(), __dax_pmd_fault(), dax_zero_page_range() > don't really need bdev - we have agreed that get_block() fills that in just > fine. > > dax_clear_blocks() has IMO just the wrong signature - it should take bdev > and not inode as an argument. Because combination inode + bdev sector > doesn't really make much sense. > > dax_writeback_mapping_range() is the only remaining offender and it can > easily take bdev as an argument when called from ->writepages(). > > > Or can we move forward with get_bdev(), and try and figure out this new way of > > calling fsync/msync for v4.6? My main concern here is that changing how the > > DAX sync code gets called will affect all three filesystems as well as MM, and > > that it might be too much for RC inclusion... > > I think changes aren't very intrusive so we can feed them in during RC > phase and frankly, you have to move to using ->writepages() anyway to make > sync(2) work reliably. Okay, sounds good. I'll send it out once I've got it working & tested. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html