On Mon, Feb 04, 2019 at 04:11:47PM +0100, Carlos Maiolino wrote: > Hi, Sorry for the long delay Darrick. > > > > + fextent.fe_logical = 0; > > > + fextent.fe_physical = 0; > > > + fieinfo.fi_extents_max = 1; > > > + fieinfo.fi_extents_mapped = 0; > > > + fieinfo.fi_extents_start = &fextent; > > > + fieinfo.fi_start = start; > > > + fieinfo.fi_len = 1 << inode->i_blkbits; > > > + fieinfo.fi_flags = 0; > > > + fieinfo.fi_cb = fiemap_fill_kernel_extent; > > > + > > > + error = inode->i_op->fiemap(inode, &fieinfo); > > > + > > > + if (error) > > > + return error; > > > + > > > + if (fieinfo.fi_flags & (FIEMAP_EXTENT_UNKNOWN | > > > + FIEMAP_EXTENT_ENCODED | > > > + FIEMAP_EXTENT_DATA_INLINE | > > > + FIEMAP_EXTENT_UNWRITTEN)) > > > + return -EINVAL; > > > > AFAICT, three of the filesystems that support COW writes (xfs, ocfs2, > > and btrfs) do not return bmap results for files with shared blocks. > > This check here should include FIEMAP_EXTENT_SHARED since external > > overwrites of a COW file block are bad news on btrfs (and ocfs2 and > > xfs). > > ok, np > > > > > > + > > > + *block = (fextent.fe_physical + > > > + (start - fextent.fe_logical)) >> inode->i_blkbits; > > > > Hmmm, so there's nothing here checking that the physical device fiemap > > reports is the same device that was passed into the mount. This is > > trivially true for most of the filesystems that implement bmap and > > fiemap, but definitely not true for xfs or btrfs. I would bet most > > userspace callers of bmap (since it's an ext2-era ioctl) make that > > assumption and don't even know how to find the device. > > Makes sense. > > > > > On xfs, the bmap implementation won't return any results for realtime > > files, but it looks as though we suddenly will start doing that here, > > because in the new bmap implementation we will use fiemap, and fiemap > > reports extents without providing any context about which device they're > > on, and that context-less extent gets passed back to bmap_fiemap. > > > > In any case, I think a better solution to the multi-device problem is to > > start returning device information via struct fiemap_extent, at least > > inside the kernel. Use one of the reserved fields to declare a new > > '__u32 fe_device' field in struct fiemap_extent which can be the dev_t > > device number, and then you can check that against inode->i_sb->s_bdev > > to avoid returning results for the non-primary device of a multi-device > > filesystem. > > I agree we should address it here, but I don't think fiemap_extent is the right > place for it, it is linked to the UAPI, and changing it is usually not a good > idea. Adding a FIEMAP_EXTENT flag or two to turn one of the fe_reserved fields into some sort of dev_t/per-device cookie should be fine. Userspace shouldn't be expecting any meaning in reserved areas. > I think I got your idea anyway, but, what if, instead returning the bdev in > fiemap_extent, we instead, send a flag (via fi_flags) to the filesystem, to > idenfify a FIBMAP or a FIEMAP call, and let the filesystem decide what to do > with such information? I don't like the idea of adding a FIEMAP_FLAG to distinguish callers. --D > > > > > + > > > + return error; > > > +} > > > + > > > /** > > > * bmap - find a block number in a file > > > * @inode: inode owning the block number being requested > > > @@ -1594,10 +1628,14 @@ EXPORT_SYMBOL(iput); > > > */ > > > int bmap(struct inode *inode, sector_t *block) > > > { > > > - if (!inode->i_mapping->a_ops->bmap) > > > + if (inode->i_op->fiemap) > > > + return bmap_fiemap(inode, block); > > > + else if (inode->i_mapping->a_ops->bmap) > > > + *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, > > > + *block); > > > + else > > > return -EINVAL; > > > > Waitaminute. btrfs currently supports fiemap but not bmap, and now > > suddenly it will support this legacy interface they've never supported > > before. Are they on board with this? > > > > --D > > > > > > > > - *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block); > > > return 0; > > > } > > > EXPORT_SYMBOL(bmap); > > > diff --git a/fs/ioctl.c b/fs/ioctl.c > > > index 6086978fe01e..bfa59df332bf 100644 > > > --- a/fs/ioctl.c > > > +++ b/fs/ioctl.c > > > @@ -116,6 +116,38 @@ int fiemap_fill_user_extent(struct fiemap_extent_info *fieinfo, u64 logical, > > > return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > > } > > > > > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *fieinfo, u64 logical, > > > + u64 phys, u64 len, u32 flags) > > > +{ > > > + struct fiemap_extent *extent = fieinfo->fi_extents_start; > > > + > > > + /* only count the extents */ > > > + if (fieinfo->fi_extents_max == 0) { > > > + fieinfo->fi_extents_mapped++; > > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > > + } > > > + > > > + if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max) > > > + return 1; > > > + > > > + if (flags & SET_UNKNOWN_FLAGS) > > > + flags |= FIEMAP_EXTENT_UNKNOWN; > > > + if (flags & SET_NO_UNMOUNTED_IO_FLAGS) > > > + flags |= FIEMAP_EXTENT_ENCODED; > > > + if (flags & SET_NOT_ALIGNED_FLAGS) > > > + flags |= FIEMAP_EXTENT_NOT_ALIGNED; > > > + > > > + extent->fe_logical = logical; > > > + extent->fe_physical = phys; > > > + extent->fe_length = len; > > > + extent->fe_flags = flags; > > > + > > > + fieinfo->fi_extents_mapped++; > > > + > > > + if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max) > > > + return 1; > > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > > +} > > > /** > > > * fiemap_fill_next_extent - Fiemap helper function > > > * @fieinfo: Fiemap context passed into ->fiemap > > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > > index 7a434979201c..28bb523d532a 100644 > > > --- a/include/linux/fs.h > > > +++ b/include/linux/fs.h > > > @@ -1711,6 +1711,8 @@ struct fiemap_extent_info { > > > fiemap_fill_cb fi_cb; > > > }; > > > > > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *info, u64 logical, > > > + u64 phys, u64 len, u32 flags); > > > int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, > > > u64 phys, u64 len, u32 flags); > > > int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags); > > > -- > > > 2.17.2 > > > > > -- > Carlos