On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote: > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > > > On Thu, Dec 09, 2021 at 07:38:27AM +0100, Christoph Hellwig wrote: > > > These methods indirect the actual DAX read/write path. In the end pmem > > > uses magic flush and mc safe variants and fuse and dcssblk use plain ones > > > while device mapper picks redirects to the underlying device. > > > > > > Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these > > > special variants, then use them everywhere as they fall back to the plain > > > ones on s390 anyway and remove an indirect call from the read/write path > > > as well as a lot of boilerplate code. > > > > > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > > --- > > > drivers/dax/super.c | 36 ++++++++++++++-- > > > drivers/md/dm-linear.c | 20 --------- > > > drivers/md/dm-log-writes.c | 80 ----------------------------------- > > > drivers/md/dm-stripe.c | 20 --------- > > > drivers/md/dm.c | 50 ---------------------- > > > drivers/nvdimm/pmem.c | 20 --------- > > > drivers/s390/block/dcssblk.c | 14 ------ > > > fs/dax.c | 5 --- > > > fs/fuse/virtio_fs.c | 19 +-------- > > > include/linux/dax.h | 9 ++-- > > > include/linux/device-mapper.h | 4 -- > > > 11 files changed, 37 insertions(+), 240 deletions(-) > > > > > > > [..] > > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c > > > index 5c03a0364a9bb..754319ce2a29b 100644 > > > --- a/fs/fuse/virtio_fs.c > > > +++ b/fs/fuse/virtio_fs.c > > > @@ -753,20 +753,6 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, > > > return nr_pages > max_nr_pages ? max_nr_pages : nr_pages; > > > } > > > > > > -static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev, > > > - pgoff_t pgoff, void *addr, > > > - size_t bytes, struct iov_iter *i) > > > -{ > > > - return copy_from_iter(addr, bytes, i); > > > -} > > > - > > > -static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev, > > > - pgoff_t pgoff, void *addr, > > > - size_t bytes, struct iov_iter *i) > > > -{ > > > - return copy_to_iter(addr, bytes, i); > > > -} > > > - > > > static int virtio_fs_zero_page_range(struct dax_device *dax_dev, > > > pgoff_t pgoff, size_t nr_pages) > > > { > > > @@ -783,8 +769,6 @@ static int virtio_fs_zero_page_range(struct dax_device *dax_dev, > > > > > > static const struct dax_operations virtio_fs_dax_ops = { > > > .direct_access = virtio_fs_direct_access, > > > - .copy_from_iter = virtio_fs_copy_from_iter, > > > - .copy_to_iter = virtio_fs_copy_to_iter, > > > .zero_page_range = virtio_fs_zero_page_range, > > > }; > > > > > > @@ -853,7 +837,8 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) > > > fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops); > > > if (IS_ERR(fs->dax_dev)) > > > return PTR_ERR(fs->dax_dev); > > > - > > > + set_dax_cached(fs->dax_dev); > > > > Looks good to me from virtiofs point of view. > > > > Reviewed-by: Vivek Goyal <vgoyal@xxxxxxxxxx> > > > > Going forward, I am wondering should virtiofs use flushcache version as > > well. What if host filesystem is using DAX and mapping persistent memory > > pfn directly into qemu address space. I have never tested that. > > > > Right now we are relying on applications to do fsync/msync on virtiofs > > for data persistence. > > This sounds like it would need coordination with a paravirtualized > driver that can indicate whether the host side is pmem or not, like > the virtio_pmem driver. Agreed. Let me check the details of virtio_pmem driver. > However, if the guest sends any fsync/msync > you would still need to go explicitly cache flush any dirty page > because you can't necessarily trust that the guest did that already. So host dax functionality will already take care of that, IIUC, right? I see a dax_flush() call in dax_writeback_one(). I am assuming that's the will take care of flushing dirty pages when guest issues fsync()/msync(). So probably don't have to do anything extra here. I think qemu should map files using MAP_SYNC though in this case though. Any read/writes to virtiofs files will turn into host file load/store operations. So flushcache in guest makes more sense with MAP_SYNC which should make sure any filesystem metadata will already persist after fault completion. And later guest can do writes followed by flush and ensure data persists too. IOW, I probably only need to do following. - In virtiofs virtual device, add a notion of kind of dax window or memory it supports. So may be some kind of "writethrough" property of virtiofs dax cache. - Use this property in virtiofs driver to decide whether to use plain copy_from_iter() or _copy_from_iter_flushcache(). - qemu should use mmap(MAP_SYNC) if host filesystem is on persistent memory. Thanks Vivek