From: Boaz Harrosh <boaz@xxxxxxxxxxxxx> When freezing an FS, we must write protect all IS_DAX() inodes that have an mmap mapping on an inode. Otherwise application will be able to modify previously faulted-in file pages. I'm actually doing a full unmap_mapping_range because there is no readily available "mapping_write_protect" like functionality. I do not think it is worth it to define one just for here and just for some extra read-faults after an fs_freeze. How hot-path is fs_freeze at all? CC: Jan Kara <jack@xxxxxxx> CC: Matthew Wilcox <matthew.r.wilcox@xxxxxxxxx> CC: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Boaz Harrosh <boaz@xxxxxxxxxxxxx> --- fs/dax.c | 30 ++++++++++++++++++++++++++++++ fs/super.c | 3 +++ include/linux/fs.h | 1 + 3 files changed, 34 insertions(+) diff --git a/fs/dax.c b/fs/dax.c index d0bd1f4..f3fc28b 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -549,3 +549,33 @@ int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block) return dax_zero_page_range(inode, from, length, get_block); } EXPORT_SYMBOL_GPL(dax_truncate_page); + +/* This is meant to be called as part of freeze_super. otherwise we might + * Need some extra locking before calling here. + */ +void dax_prepare_freeze(struct super_block *sb) +{ + struct inode *inode; + + /* TODO: each DAX fs has some private mount option to enable DAX. If + * We made that option a generic MS_DAX_ENABLE super_block flag we could + * Avoid the 95% extra unneeded loop-on-all-inodes every freeze. + * if (!(sb->s_flags & MS_DAX_ENABLE)) + * return 0; + */ + + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + /* TODO: For freezing we can actually do with write-protecting + * the page. But I cannot find a ready made function that does + * that for a giving mapping (with all the proper locking). + * How performance sensitive is the all sb_freeze API? + * For now we can just unmap the all mapping, and pay extra + * on read faults. + */ + /* NOTE: Do not unmap private COW mapped pages it will not + * modify the FS. + */ + if (IS_DAX(inode)) + unmap_mapping_range(inode->i_mapping, 0, 0, 0); + } +} diff --git a/fs/super.c b/fs/super.c index 2b7dc90..9ef490c 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1329,6 +1329,9 @@ int freeze_super(struct super_block *sb) /* All writers are done so after syncing there won't be dirty data */ sync_filesystem(sb); + /* Need to take care of DAX mmaped inodes */ + dax_prepare_freeze(sb); + /* Now wait for internal filesystem counter */ sb->s_writers.frozen = SB_FREEZE_FS; smp_wmb(); diff --git a/include/linux/fs.h b/include/linux/fs.h index 24af817..3b943d4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2599,6 +2599,7 @@ int dax_truncate_page(struct inode *, loff_t from, get_block_t); int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t); int dax_pfn_mkwrite(struct vm_area_struct *, struct vm_fault *); #define dax_mkwrite(vma, vmf, gb) dax_fault(vma, vmf, gb) +void dax_prepare_freeze(struct super_block *sb); #ifdef CONFIG_BLOCK typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode, -- 1.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>