Re: [PATCH 09/10 V2] Use FIEMAP for FIBMAP calls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 04, 2019 at 04:11:47PM +0100, Carlos Maiolino wrote:
> Hi, Sorry for the long delay Darrick.
> 
> > > +	fextent.fe_logical = 0;
> > > +	fextent.fe_physical = 0;
> > > +	fieinfo.fi_extents_max = 1;
> > > +	fieinfo.fi_extents_mapped = 0;
> > > +	fieinfo.fi_extents_start = &fextent;
> > > +	fieinfo.fi_start = start;
> > > +	fieinfo.fi_len = 1 << inode->i_blkbits;
> > > +	fieinfo.fi_flags = 0;
> > > +	fieinfo.fi_cb = fiemap_fill_kernel_extent;
> > > +
> > > +	error = inode->i_op->fiemap(inode, &fieinfo);
> > > +
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	if (fieinfo.fi_flags & (FIEMAP_EXTENT_UNKNOWN |
> > > +				FIEMAP_EXTENT_ENCODED |
> > > +				FIEMAP_EXTENT_DATA_INLINE |
> > > +				FIEMAP_EXTENT_UNWRITTEN))
> > > +		return -EINVAL;
> > 
> > AFAICT, three of the filesystems that support COW writes (xfs, ocfs2,
> > and btrfs) do not return bmap results for files with shared blocks.
> > This check here should include FIEMAP_EXTENT_SHARED since external
> > overwrites of a COW file block are bad news on btrfs (and ocfs2 and
> > xfs).
> 
> ok, np
> 
> > 
> > > +
> > > +	*block = (fextent.fe_physical +
> > > +		  (start - fextent.fe_logical)) >> inode->i_blkbits;
> > 
> > Hmmm, so there's nothing here checking that the physical device fiemap
> > reports is the same device that was passed into the mount.  This is
> > trivially true for most of the filesystems that implement bmap and
> > fiemap, but definitely not true for xfs or btrfs.  I would bet most
> > userspace callers of bmap (since it's an ext2-era ioctl) make that
> > assumption and don't even know how to find the device.
> 
> Makes sense.
> 
> > 
> > On xfs, the bmap implementation won't return any results for realtime
> > files, but it looks as though we suddenly will start doing that here,
> > because in the new bmap implementation we will use fiemap, and fiemap
> > reports extents without providing any context about which device they're
> > on, and that context-less extent gets passed back to bmap_fiemap.
> > 
> > In any case, I think a better solution to the multi-device problem is to
> > start returning device information via struct fiemap_extent, at least
> > inside the kernel.  Use one of the reserved fields to declare a new
> > '__u32 fe_device' field in struct fiemap_extent which can be the dev_t
> > device number, and then you can check that against inode->i_sb->s_bdev
> > to avoid returning results for the non-primary device of a multi-device
> > filesystem.
> 
> I agree we should address it here, but I don't think fiemap_extent is the right
> place for it, it is linked to the UAPI, and changing it is usually not a good
> idea.

Adding a FIEMAP_EXTENT flag or two to turn one of the fe_reserved fields
into some sort of dev_t/per-device cookie should be fine.  Userspace
shouldn't be expecting any meaning in reserved areas.

> I think I got your idea anyway, but, what if, instead returning the bdev in
> fiemap_extent, we instead, send a flag (via fi_flags) to the filesystem, to
> idenfify a FIBMAP or a FIEMAP call, and let the filesystem decide what to do
> with such information?

I don't like the idea of adding a FIEMAP_FLAG to distinguish callers.

--D

> > 
> > > +
> > > +	return error;
> > > +}
> > > +
> > >  /**
> > >   *	bmap	- find a block number in a file
> > >   *	@inode:  inode owning the block number being requested
> > > @@ -1594,10 +1628,14 @@ EXPORT_SYMBOL(iput);
> > >   */
> > >  int bmap(struct inode *inode, sector_t *block)
> > >  {
> > > -	if (!inode->i_mapping->a_ops->bmap)
> > > +	if (inode->i_op->fiemap)
> > > +		return bmap_fiemap(inode, block);
> > > +	else if (inode->i_mapping->a_ops->bmap)
> > > +		*block = inode->i_mapping->a_ops->bmap(inode->i_mapping,
> > > +						       *block);
> > > +	else
> > >  		return -EINVAL;
> > 
> > Waitaminute.  btrfs currently supports fiemap but not bmap, and now
> > suddenly it will support this legacy interface they've never supported
> > before.  Are they on board with this?
> > 
> > --D
> > 
> > >  
> > > -	*block = inode->i_mapping->a_ops->bmap(inode->i_mapping, *block);
> > >  	return 0;
> > >  }
> > >  EXPORT_SYMBOL(bmap);
> > > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > > index 6086978fe01e..bfa59df332bf 100644
> > > --- a/fs/ioctl.c
> > > +++ b/fs/ioctl.c
> > > @@ -116,6 +116,38 @@ int fiemap_fill_user_extent(struct fiemap_extent_info *fieinfo, u64 logical,
> > >  	return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > >  }
> > >  
> > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *fieinfo, u64 logical,
> > > +			    u64 phys, u64 len, u32 flags)
> > > +{
> > > +	struct fiemap_extent *extent = fieinfo->fi_extents_start;
> > > +
> > > +	/* only count the extents */
> > > +	if (fieinfo->fi_extents_max == 0) {
> > > +		fieinfo->fi_extents_mapped++;
> > > +		return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > > +	}
> > > +
> > > +	if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max)
> > > +		return 1;
> > > +
> > > +	if (flags & SET_UNKNOWN_FLAGS)
> > > +		flags |= FIEMAP_EXTENT_UNKNOWN;
> > > +	if (flags & SET_NO_UNMOUNTED_IO_FLAGS)
> > > +		flags |= FIEMAP_EXTENT_ENCODED;
> > > +	if (flags & SET_NOT_ALIGNED_FLAGS)
> > > +		flags |= FIEMAP_EXTENT_NOT_ALIGNED;
> > > +
> > > +	extent->fe_logical = logical;
> > > +	extent->fe_physical = phys;
> > > +	extent->fe_length = len;
> > > +	extent->fe_flags = flags;
> > > +
> > > +	fieinfo->fi_extents_mapped++;
> > > +
> > > +	if (fieinfo->fi_extents_mapped == fieinfo->fi_extents_max)
> > > +		return 1;
> > > +	return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
> > > +}
> > >  /**
> > >   * fiemap_fill_next_extent - Fiemap helper function
> > >   * @fieinfo:	Fiemap context passed into ->fiemap
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index 7a434979201c..28bb523d532a 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -1711,6 +1711,8 @@ struct fiemap_extent_info {
> > >  	fiemap_fill_cb	fi_cb;
> > >  };
> > >  
> > > +int fiemap_fill_kernel_extent(struct fiemap_extent_info *info, u64 logical,
> > > +			      u64 phys, u64 len, u32 flags);
> > >  int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
> > >  			    u64 phys, u64 len, u32 flags);
> > >  int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
> > > -- 
> > > 2.17.2
> > > 
> 
> -- 
> Carlos



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux