[adding Vivek Goyal and virtio-fs list] On Sun, 5 Nov 2023 at 07:36, <obuil.liubo@xxxxxxxxx> wrote: > > From: Liu Bo <bo.liu@xxxxxxxxxxxxxxxxx> > > In open(O_TRUNC), it doesn't cleanup the corresponding dax mapping of > an inode, which could expose stale data to users, which can be easily > reproduced by accessing overlayfs through virtiofs with dax. > > The reproducer is, > 0. mkdir /home/test/lower /home/test/upper /home/test/work > 1. echo "aaaaaa" > /home/test/lower/foobar > 2. sudo mount -t overlay overlay -olowerdir=/home/test/lower,upperdir=/home/test/upper,workdir=/home/test/work /mnt/ovl > 3. start a guest VM configured with virtiofs passthrough(_dax_ enabled and cache policy is always) > and use /mnt/ovl as the source directory. > 4. ssh into the guest VM > 5. mount virtiofs onto /mnt/test > 5. cat /mnt/test/foobar > aaaaaa > 6. echo "xxxx" > /mnt/test/foobar > 7. cat /mnt/test/foobar > > w/o this patch, step 7 shows "aaaa" rather than "xxxx". This is generally unsupported. See Documentation/filesystems/overlayfs.rst: "Changes to underlying filesystems --------------------------------- Changes to the underlying filesystems while part of a mounted overlay filesystem are not allowed. If the underlying filesystem is changed, the behavior of the overlay is undefined, though it will not result in a crash or deadlock. ..." Thanks, Miklos > > Step 5 reads the content of file 'foobar' by setting up a dax mapping, and the > mapping eventually maps /home/test/lower/foobar because there is no copy up > at this moment. > > In step 6, 'foobar' is opened with O_TRUNC and the viriofs daemon on host side > triggers a copy-up, "xxxx" is eventually written to /home/test/upper/foobar. > Since 'foobar' is truncated to size 0 when writting, writes go with non-dax io > path and update the file size in guest VM accordingly. > > However, dax mapping still refers to the /home/test/lower/foobar, so what step > 7 reads is /home/test/lower/foobar but with the new size, which shows "aaaa" > rather "xxxx". > > Reported-by: Cameron <cameron@xxxxxxxxxxxxxx> > Tested-by: Cameron <cameron@xxxxxxxxxxxxxx> > Signed-off-by: Liu Bo <bo.liu@xxxxxxxxxxxxxxxxx> > --- > fs/fuse/dax.c | 9 +++++++++ > fs/fuse/dir.c | 5 +++++ > fs/fuse/fuse_i.h | 1 + > 3 files changed, 15 insertions(+) > > diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c > index 8e74f278a3f6..d7f9ec7f4597 100644 > --- a/fs/fuse/dax.c > +++ b/fs/fuse/dax.c > @@ -385,6 +385,15 @@ void fuse_dax_inode_cleanup(struct inode *inode) > WARN_ON(fi->dax->nr); > } > > +/* Callers need to make sure fi->i_mmap_sem is held. */ > +void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start) > +{ > + struct fuse_conn *fc = get_fuse_conn(inode); > + struct fuse_inode *fi = get_fuse_inode(inode); > + > + inode_reclaim_dmap_range(fc->dax, inode, start, -1); > +} > + > static void fuse_fill_iomap_hole(struct iomap *iomap, loff_t length) > { > iomap->addr = IOMAP_NULL_ADDR; > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index f67bef9d83c4..be7892e052b5 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -1788,6 +1788,9 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, > */ > i_size_write(inode, 0); > truncate_pagecache(inode, 0); > + if (fault_blocked && FUSE_IS_DAX(inode)) > + fuse_dax_inode_cleanup(inode); > + > goto out; > } > file = NULL; > @@ -1883,6 +1886,8 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, > S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) { > truncate_pagecache(inode, outarg.attr.size); > invalidate_inode_pages2(mapping); > + if (fault_blocked && FUSE_IS_DAX(inode)) > + fuse_dax_inode_cleanup_range(inode, outarg.attr.size); > } > > clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index 9b7fc7d3c7f1..02fa7aa2bd56 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -1305,6 +1305,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc); > bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi); > void fuse_dax_inode_init(struct inode *inode, unsigned int flags); > void fuse_dax_inode_cleanup(struct inode *inode); > +void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start); > void fuse_dax_dontcache(struct inode *inode, unsigned int flags); > bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment); > void fuse_dax_cancel_work(struct fuse_conn *fc); > -- > 2.32.1 (Apple Git-133) >