Re: [PATCH 2/2] dax: fix bdev NULL pointer dereferences

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 04, 2016 at 09:29:57PM +0100, Jan Kara wrote:
> I think changes aren't very intrusive so we can feed them in during RC
> phase and frankly, you have to move to using ->writepages() anyway to make
> sync(2) work reliably.

I've been looking into this a bit more, and I don't think we actually want to
have DAX flushing in response to sync(2) or syncfs(2).  Here's some text from
the BUGS section of the sync(2) man page:

BUGS
	According to the standard specification (e.g., POSIX.1-2001), sync()
	schedules the writes,  but  may  return before  the  actual  writing
	is done.  However, since version 1.3.20 Linux does actually wait.
	(This still does not guarantee data integrity: modern disks have large
	caches.) 

Based on this I don't believe that it is a requirement that sync and syncfs
actually flush the data durably to media before they return - they just need
to make sure it has been sent to the device, which is always true for all
writes PMEM via DAX, even without any calls to dax_writeback_mapping_range().

The fsync(2) man page, on the other hand, *does* require that a call to
fsync() will result in the data being durable on the media before the system
call returns.  From fsync(2):

	fsync()  transfers  ("flushes") all modified in-core data of (i.e.,
	modified buffer cache pages for) the file referred to by the file
	descriptor fd to the disk device (or other permanent  storage  device)
	so  that  all changed information  can  be retrieved even after the
	system crashed or was rebooted.  This includes writing through or
	flushing a disk cache if present. 

I think we are doing the right thing if we only call
dax_writeback_mapping_range() in response to fsync() and msync(), but skip it
in response to sync() and syncfs().  Practically this also simplifies things a
lot because we don't need to worry about cleaning the DAX inodes.  With my
naive implementation where I was calling dax_writeback_mapping_range() in
ext4_writepages(), we were flushing all DAX pages that had ever been dirtied
every 5 seconds in response to a periodic filesystem sync(), which I'm sure is
untenable.

Please let me know if anyone disagrees with any of the above.  Based on this,
I'm off to move the calls to dax_writeback_mapping_range() back into the
filesystem specific fsync()/msync() paths so that we can provide an
appropriate block device.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux