On May 24, 2008 17:01 -0700, Mark Fasheh wrote: > Basic vfs-level fiemap infrastructure, which sets up a new ->fiemap > inode operation. Mark, I was looking at a way to remove the special-casing of NUM_EXTENTS from ioctl_fiemap() in an effort to remove Christoph's objection to keeping these in the same ioctl. I think it is possible and reasonable to move the special-case handling into fiemap_fill_next_extent(). > +static int ioctl_fiemap(struct file *filp, unsigned long arg) > +{ > + > + if (!(fiemap.fm_flags & FIEMAP_FLAG_NUM_EXTENTS) && > + (fiemap.fm_extent_count == 0 || > + fiemap.fm_extent_count > FIEMAP_MAX_EXTENTS)) > + return -EINVAL; This can be changed to only check: if (fm_extent_count > FIEMAP_MAX_EXTENTS) return -EINVAL; > + fieinfo.fi_flags = fiemap.fm_flags; > + if (!(fiemap.fm_flags & FIEMAP_FLAG_NUM_EXTENTS)) { > + fieinfo.fi_extents_max = fiemap.fm_extent_count; > + fieinfo.fi_extents_start = (char *)arg + sizeof(fiemap); > + > + if (!access_ok(VERIFY_WRITE, fieinfo.fi_extents_start, > + fieinfo.fi_extents_max * sizeof(struct fiemap_extent))) > + return -EFAULT; > + } It is harmless to set fi_extents_max and fi_extents_start, as this is ignored by NUM_EXTENTS. The fiemap_fill_next_extent() will already check in copy_to_user() whether the fi_extents_start pointer is valid, and fiemap_fill_next_extent() doesn't even get far enough to look at fi_extents_max or fi_extents_start. We just do: fieinfo.fi_extents = fiemap.fm_extent_count; fieinfo.fi_extents_start = (struct fiemap_extent *)((char *)arg + sizeof(fiemap)); This leaves us with no checks for FIEMAP_FLAG_NUM_EXTENTS in ioctl_fiemap() at all, and no changes needed in fiemap_fill_next_extent(). > > What about the idea to have fiemap_fill_next_extent() do "extent" merging > > for filesystems that use the generic helper but do not return multiple > > blocks via get_blocks()? I don't think that is too hard to implement, > > and makes the output more useful, otherwise we get an extent per block. > > The above is what I _think_ will work, haven't actually tried it out. > > I don't think we want to automatically merge extents within this helper > function. Otherwise we would diverge from the actual disk layout for extent > based file systems where an extent might be broken up between two records > for some other reason, such as maximum extent length being exceeded. Do we really want to expose the filesystem-specific extent-length limits to userspace? In some sense, a block-based filesystem has a maximum extent length of the blocksize, but it seems totally reasonable to merge contiguous blocks into a single "extent" for return to userspace. I don't see this significantly different for ext4, even though it can have extents up to 128MB, or unwritten extents up to 64MB. > Btw, how many block-based file systems that don't return multiple blocks via > get_blocks() are there that we actually care about enough to write this > code? That I have no clue about. Joseph? Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html