Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 23, 2016 at 11:06:44PM +1100, Dave Chinner wrote:
> On Tue, Feb 23, 2016 at 10:07:07AM +0000, Rudoff, Andy wrote:
> > 
> > > [Hi Andy - care to properly line break after ~75 character, that makes
> > > ready the message a lot easier, thanks!]
> > 
> > My bad. 
> > 
> > >> The instructions give you very fine-grain flushing control, but the
> > >> downside is that the app must track what it changes at that fine
> > >> granularity.  Both models work, but there's a trade-off.
> > > 
> > > No, the cache flush model simply does not work without a lot of hard
> > > work to enable it first.
> > 
> > It's working well enough to pass tests that simulate crashes and
> > various workload tests for the apps involved. And I agree there
> > has been a lot of hard work behind it. I guess I'm not sure why you're
> > saying it is impossible or not working.
> > 
> > Let's take an example: an app uses fallocate() to create a DAX file,
> > mmap() to map it, msync() to flush changes. The app follows POSIX
> > meaning it doesn't expect file metadata to be flushed magically, etc.
> > The app is tested carefully and it works correctly.  Now the msync()
> > call used to flush stores is replaced by flushing instructions.
> > What's broken?
> 
> You haven't told the filesytem to flush any dirty metadata required
> to access the user data to persistent storage.  If the zeroing and
> unwritten extent conversion that is run by the filesytem during
> write faults into preallocated blocks isn't persistent, then after a
> crash the file will read back as unwritten extents, returning zeros
> rather than the data that was written.
> 
> msync() calls fsync() on file back pages, which makes file metadata
> changes persistent.  Indeed, if you read the fdatasync man page, you
> might have noticed that it makes explicit reference that it requires
> the filesystem to flush the metadata needed to access the data that
> is being synced. IOWs, the filesystem knows about this dirty
> metadata that needs to be flushed to ensure data integrity,
> userspace doesn't.
> 
> Not to mention that the filesystem will convert and zero much more
> than just a single cacheline (whole pages at minimum, could be 2MB
> extents for large pages, etc) so the filesystem may require CPU
> cache flushes over a much wider range of cachelines that the
> application realises are dirty and require flushing for data
> integrity purposes. The filesytem knows about these dirty cache
> lines, userspace doesn't.

With the current code at least dax_zero_page_range() doesn't rely on
fsync/msync from userspace to make the zeroes that it writes persistent.  It
does all the necessary flushing and wmb_pmem() calls itself.  I agree that
this does not address your concern about metadata being in sync, though.

> IOWs, your userspace library may have made sure the data it modifies
> is in the physical location via your userspace CPU cache flushes,
> but there can be a lot of stuff it doesn't know about internal to
> the filesytem that also needs to be flushed to ensure data integrity
> is maintained.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@xxxxxxxxxxxx
> https://lists.01.org/mailman/listinfo/linux-nvdimm

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]