Re: [PATCH] fuse: fix stale data issue of virtiofs with dax

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[adding Vivek Goyal and virtio-fs list]

On Sun, 5 Nov 2023 at 07:36, <obuil.liubo@xxxxxxxxx> wrote:
>
> From: Liu Bo <bo.liu@xxxxxxxxxxxxxxxxx>
>
> In open(O_TRUNC), it doesn't cleanup the corresponding dax mapping of
> an inode, which could expose stale data to users, which can be easily
> reproduced by accessing overlayfs through virtiofs with dax.
>
> The reproducer is,
> 0. mkdir /home/test/lower /home/test/upper /home/test/work
> 1. echo "aaaaaa" > /home/test/lower/foobar
> 2. sudo mount -t overlay overlay -olowerdir=/home/test/lower,upperdir=/home/test/upper,workdir=/home/test/work /mnt/ovl
> 3. start a guest VM configured with virtiofs passthrough(_dax_ enabled and cache policy is always)
>    and use /mnt/ovl as the source directory.
> 4. ssh into the guest VM
> 5. mount virtiofs onto /mnt/test
> 5. cat /mnt/test/foobar
> aaaaaa
> 6. echo "xxxx" > /mnt/test/foobar
> 7. cat /mnt/test/foobar
>
> w/o this patch, step 7 shows "aaaa" rather than "xxxx".

This is generally unsupported.  See Documentation/filesystems/overlayfs.rst:

"Changes to underlying filesystems
---------------------------------

Changes to the underlying filesystems while part of a mounted overlay
filesystem are not allowed.  If the underlying filesystem is changed,
the behavior of the overlay is undefined, though it will not result in
a crash or deadlock.

..."

Thanks,
Miklos


>
> Step 5 reads the content of file 'foobar' by setting up a dax mapping, and the
> mapping eventually maps /home/test/lower/foobar because there is no copy up
> at this moment.
>
> In step 6, 'foobar' is opened with O_TRUNC and the viriofs daemon on host side
> triggers a copy-up, "xxxx" is eventually written to /home/test/upper/foobar.
> Since 'foobar' is truncated to size 0 when writting, writes go with non-dax io
> path and update the file size in guest VM accordingly.
>
> However, dax mapping still refers to the /home/test/lower/foobar, so what step
> 7 reads is /home/test/lower/foobar but with the new size, which shows "aaaa"
> rather "xxxx".
>
> Reported-by: Cameron <cameron@xxxxxxxxxxxxxx>
> Tested-by: Cameron <cameron@xxxxxxxxxxxxxx>
> Signed-off-by: Liu Bo <bo.liu@xxxxxxxxxxxxxxxxx>
> ---
>  fs/fuse/dax.c    | 9 +++++++++
>  fs/fuse/dir.c    | 5 +++++
>  fs/fuse/fuse_i.h | 1 +
>  3 files changed, 15 insertions(+)
>
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 8e74f278a3f6..d7f9ec7f4597 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -385,6 +385,15 @@ void fuse_dax_inode_cleanup(struct inode *inode)
>         WARN_ON(fi->dax->nr);
>  }
>
> +/* Callers need to make sure fi->i_mmap_sem is held. */
> +void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start)
> +{
> +       struct fuse_conn *fc = get_fuse_conn(inode);
> +       struct fuse_inode *fi = get_fuse_inode(inode);
> +
> +       inode_reclaim_dmap_range(fc->dax, inode, start, -1);
> +}
> +
>  static void fuse_fill_iomap_hole(struct iomap *iomap, loff_t length)
>  {
>         iomap->addr = IOMAP_NULL_ADDR;
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index f67bef9d83c4..be7892e052b5 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1788,6 +1788,9 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
>                          */
>                         i_size_write(inode, 0);
>                         truncate_pagecache(inode, 0);
> +                       if (fault_blocked && FUSE_IS_DAX(inode))
> +                               fuse_dax_inode_cleanup(inode);
> +
>                         goto out;
>                 }
>                 file = NULL;
> @@ -1883,6 +1886,8 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
>             S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) {
>                 truncate_pagecache(inode, outarg.attr.size);
>                 invalidate_inode_pages2(mapping);
> +               if (fault_blocked && FUSE_IS_DAX(inode))
> +                       fuse_dax_inode_cleanup_range(inode, outarg.attr.size);
>         }
>
>         clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 9b7fc7d3c7f1..02fa7aa2bd56 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1305,6 +1305,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start);
>  void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
> --
> 2.32.1 (Apple Git-133)
>




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux