From: Liu Bo <bo.liu@xxxxxxxxxxxxxxxxx> In open(O_TRUNC), it doesn't cleanup the corresponding dax mapping of an inode, which could expose stale data to users, which can be easily reproduced by accessing overlayfs through virtiofs with dax. The reproducer is, 0. mkdir /home/test/lower /home/test/upper /home/test/work 1. echo "aaaaaa" > /home/test/lower/foobar 2. sudo mount -t overlay overlay -olowerdir=/home/test/lower,upperdir=/home/test/upper,workdir=/home/test/work /mnt/ovl 3. start a guest VM configured with virtiofs passthrough(_dax_ enabled and cache policy is always) and use /mnt/ovl as the source directory. 4. ssh into the guest VM 5. mount virtiofs onto /mnt/test 5. cat /mnt/test/foobar aaaaaa 6. echo "xxxx" > /mnt/test/foobar 7. cat /mnt/test/foobar w/o this patch, step 7 shows "aaaa" rather than "xxxx". Step 5 reads the content of file 'foobar' by setting up a dax mapping, and the mapping eventually maps /home/test/lower/foobar because there is no copy up at this moment. In step 6, 'foobar' is opened with O_TRUNC and the viriofs daemon on host side triggers a copy-up, "xxxx" is eventually written to /home/test/upper/foobar. Since 'foobar' is truncated to size 0 when writting, writes go with non-dax io path and update the file size in guest VM accordingly. However, dax mapping still refers to the /home/test/lower/foobar, so what step 7 reads is /home/test/lower/foobar but with the new size, which shows "aaaa" rather "xxxx". Reported-by: Cameron <cameron@xxxxxxxxxxxxxx> Tested-by: Cameron <cameron@xxxxxxxxxxxxxx> Signed-off-by: Liu Bo <bo.liu@xxxxxxxxxxxxxxxxx> --- fs/fuse/dax.c | 9 +++++++++ fs/fuse/dir.c | 5 +++++ fs/fuse/fuse_i.h | 1 + 3 files changed, 15 insertions(+) diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index 8e74f278a3f6..d7f9ec7f4597 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -385,6 +385,15 @@ void fuse_dax_inode_cleanup(struct inode *inode) WARN_ON(fi->dax->nr); } +/* Callers need to make sure fi->i_mmap_sem is held. */ +void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + + inode_reclaim_dmap_range(fc->dax, inode, start, -1); +} + static void fuse_fill_iomap_hole(struct iomap *iomap, loff_t length) { iomap->addr = IOMAP_NULL_ADDR; diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index f67bef9d83c4..be7892e052b5 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1788,6 +1788,9 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, */ i_size_write(inode, 0); truncate_pagecache(inode, 0); + if (fault_blocked && FUSE_IS_DAX(inode)) + fuse_dax_inode_cleanup(inode); + goto out; } file = NULL; @@ -1883,6 +1886,8 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) { truncate_pagecache(inode, outarg.attr.size); invalidate_inode_pages2(mapping); + if (fault_blocked && FUSE_IS_DAX(inode)) + fuse_dax_inode_cleanup_range(inode, outarg.attr.size); } clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 9b7fc7d3c7f1..02fa7aa2bd56 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1305,6 +1305,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc); bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi); void fuse_dax_inode_init(struct inode *inode, unsigned int flags); void fuse_dax_inode_cleanup(struct inode *inode); +void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start); void fuse_dax_dontcache(struct inode *inode, unsigned int flags); bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment); void fuse_dax_cancel_work(struct fuse_conn *fc); -- 2.32.1 (Apple Git-133)