Hi, Sorry for the long delay Darrick. > > + fextent.fe_logical = 0; > > + fextent.fe_physical = 0; > > + fieinfo.fi_extents_max = 1; > > + fieinfo.fi_extents_mapped = 0; > > + fieinfo.fi_extents_start = &fextent; > > + fieinfo.fi_start = start; > > + fieinfo.fi_len = 1 << inode->i_blkbits; > > + fieinfo.fi_flags = 0; > > + fieinfo.fi_cb = fiemap_fill_kernel_extent; > > + > > + error = inode->i_op->fiemap(inode, &fieinfo); > > + > > + if (error) > > + return error; > > + > > + if (fieinfo.fi_flags & (FIEMAP_EXTENT_UNKNOWN | > > + FIEMAP_EXTENT_ENCODED | > > + FIEMAP_EXTENT_DATA_INLINE | > > + FIEMAP_EXTENT_UNWRITTEN)) > > + return -EINVAL; > > AFAICT, three of the filesystems that support COW writes (xfs, ocfs2, > and btrfs) do not return bmap results for files with shared blocks. > This check here should include FIEMAP_EXTENT_SHARED since external > overwrites of a COW file block are bad news on btrfs (and ocfs2 and > xfs). ok, np > > > + > > + *block = (fextent.fe_physical + > > + (start - fextent.fe_logical)) >> inode->i_blkbits; > > Hmmm, so there's nothing here checking that the physical device fiemap > reports is the same device that was passed into the mount. This is > trivially true for most of the filesystems that implement bmap and > fiemap, but definitely not true for xfs or btrfs. I would bet most > userspace callers of bmap (since it's an ext2-era ioctl) make that > assumption and don't even know how to find the device. Makes sense. > > On xfs, the bmap implementation won't return any results for realtime > files, but it looks as though we suddenly will start doing that here, > because in the new bmap implementation we will use fiemap, and fiemap > reports extents without providing any context about which device they're > on, and that context-less extent gets passed back to bmap_fiemap. > > In any case, I think a better solution to the multi-device problem is to > start returning device information via struct fiemap_extent, at least > inside the kernel. Use one of the reserved fields to declare a new > '__u32 fe_device' field in struct fiemap_extent which can be the dev_t > device number, and then you can check that against inode->i_sb->s_bdev > to avoid returning results for the non-primary device of a multi-device > filesystem. I agree we should address it here, but I don't think fiemap_extent is the right place for it, it is linked to the UAPI, and changing it is usually not a good idea. I think I got your idea anyway, but, what if, instead returning the bdev in fiemap_extent, we instead, send a flag (via fi_flags) to the filesystem, to idenfify a FIBMAP or a FIEMAP call, and let the filesystem decide what to do with such information? > > > + > > + return error; > > +} > > + > > /** > > * bmap - find a block number in a file > > * @inode: inode owning the block number being requested > > @@ -1594,10 +1628,14 @@ EXPORT_SYMBOL(iput); > > */ > > int bmap(struct inode *inode, sector_t *block) > > { > > - if (!inode->i_mapping->a_ops->bmap) > > + if (inode->i_op->fiemap) > > + return bmap_fiemap(inode, block); > > + else if (inode->i_mapping->a_ops->bmap) > > + *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, > > + *block); > > + else > > return -EINVAL; > > Waitaminute. btrfs currently supports fiemap but not bmap, and now > suddenly it will support this legacy interface they've never supported > before. Are they on board with this? > > --D > > > > > - *block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block); > > return 0; > > } > > EXPORT_SYMBOL(bmap); > > diff --git a/fs/ioctl.c b/fs/ioctl.c > > index 6086978fe01e..bfa59df332bf 100644 > > --- a/fs/ioctl.c > > +++ b/fs/ioctl.c > > @@ -116,6 +116,38 @@ int fiemap_fill_user_extent(struct fiemap_extent_info *fieinfo, u64 logical, > > return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > } > > > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *fieinfo, u64 logical, > > + u64 phys, u64 len, u32 flags) > > +{ > > + struct fiemap_extent *extent = fieinfo->fi_extents_start; > > + > > + /* only count the extents */ > > + if (fieinfo->fi_extents_max == 0) { > > + fieinfo->fi_extents_mapped++; > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > + } > > + > > + if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max) > > + return 1; > > + > > + if (flags & SET_UNKNOWN_FLAGS) > > + flags |= FIEMAP_EXTENT_UNKNOWN; > > + if (flags & SET_NO_UNMOUNTED_IO_FLAGS) > > + flags |= FIEMAP_EXTENT_ENCODED; > > + if (flags & SET_NOT_ALIGNED_FLAGS) > > + flags |= FIEMAP_EXTENT_NOT_ALIGNED; > > + > > + extent->fe_logical = logical; > > + extent->fe_physical = phys; > > + extent->fe_length = len; > > + extent->fe_flags = flags; > > + > > + fieinfo->fi_extents_mapped++; > > + > > + if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max) > > + return 1; > > + return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0; > > +} > > /** > > * fiemap_fill_next_extent - Fiemap helper function > > * @fieinfo: Fiemap context passed into ->fiemap > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 7a434979201c..28bb523d532a 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -1711,6 +1711,8 @@ struct fiemap_extent_info { > > fiemap_fill_cb fi_cb; > > }; > > > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *info, u64 logical, > > + u64 phys, u64 len, u32 flags); > > int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, > > u64 phys, u64 len, u32 flags); > > int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags); > > -- > > 2.17.2 > > -- Carlos