Re: [RFC] add FIEMAP ioctl to efficiently map file allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Apr 16, 2007  18:01 +1000, Timothy Shimmin wrote:
> --On 12 April 2007 5:05:50 AM -0600 Andreas Dilger <adilger@xxxxxxxxxxxxx> 
> wrote:
> >struct fiemap_extent {
> >	__u64 fe_start;			/* starting offset in bytes */
> >	__u64 fe_len;			/* length in bytes */
> >}
> >
> >struct fiemap {
> >	struct fiemap_extent fm_start;	/* offset, length of desired mapping 
> >	*/
> >	__u32 fm_extent_count;		/* number of extents in array */
> >	__u32 fm_flags;			/* flags (similar to 
> >	XFS_IOC_GETBMAP) */
> >	__u64 unused;
> >	struct fiemap_extent fm_extents[0];
> >}
> >
> ># define FIEMAP_LEN_MASK		0xff000000000000
> ># define FIEMAP_LEN_HOLE     	0x01000000000000
> ># define FIEMAP_LEN_UNWRITTEN	0x02000000000000
> >
> >All offsets are in bytes to allow cases where filesystems are not going
> >block-aligned/sized allocations (e.g. tail packing).  The fm_extents array
> >returned contains the packed list of allocation extents for the file,
> >including entries for holes (which have fe_start == 0, and a flag).
> >
> >The ->fm_extents[] array includes all of the holes in addition to
> >allocated extents because this avoids the need to return both the logical
> >and physical address for every extent and does not make processing any
> >harder.
> 
> Well, that's what stood out for me. I was wondering where the "fe_block" 
> field had gone - the "physical address".
> So is your "fe_start; /* starting offset */" actually the disk location
> (not a logical file offset)
> _except_ in the header (fiemap) where it is the desired logical offset.

Correct.  The fm_extent in the request contains the logical start offset
and length in bytes of the requested fiemap region.  In the returned header
it represents the logical start offset of the extent that contained the
requested start offset, and the logical length of all the returned extents.
I haven't decided whether the returned length should be until EOF, or have
the "virtual hole" at the end of the file.  I think EOF makes more sense.

The fe_start + fe_len in the fm_extents represent the physical location on
the block device for that extent.  fm_extent[i].fe_start (per Anton) is
undefined if FIEMAP_LEN_HOLE is set, and .fe_len is the length of the hole.

> Okay, looking at your example use below that's what it looks like.
> And when you refer to fm_start below, you mean fm_start.fe_start?
> Sorry, I realise this is just an approximation but this part confused me.

Right, I'll write up a new RFC based on feedback here, and correcting the
various errors in the original proposal.

> So you get rid of all the logical file offsets in the extents because we
> report holes explicitly (and we know everything is contiguous if you
> include the holes).

Correct.  It saves space in the common case.

> >Caller works something like:
> >
> >	char buf[4096];
> >	struct fiemap *fm = (struct fiemap *)buf;
> >	int count = (sizeof(buf) - sizeof(*fm)) / sizeof(fm_extent);
> >	
> >	fm->fm_start.fe_start = 0; /* start of file */
> >	fm->fm_start.fe_len = -1;	/* end of file */
> >	fm->fm_extent_count = count; /* max extents in fm_extents[] array */
> >	fm->fm_flags = 0;		/* maybe "no DMAPI", etc like XFS */
> >
> >	fd = open(path, O_RDONLY);
> >	printf("logical\t\tphysical\t\tbytes\n");
> >
> >	/* The last entry will have less extents than the maximum */
> >	while (fm->fm_extent_count == count) {
> >		rc = ioctl(fd, FIEMAP, fm);
> >		if (rc)
> >			break;
> >
> >		/* kernel filled in fm_extents[] array, set fm_extent_count
> >		 * to be actual number of extents returned, leaves
> >		 * fm_start.fe_start alone (unlike XFS_IOC_GETBMAP). */
> >
> >		for (i = 0; i < fm->fm_extent_count; i++) {
> >			__u64 len = fm->fm_extents[i].fe_len & 
> >			FIEMAP_LEN_MASK;
> >			__u64 fm_next = fm->fm_start.fe_start + len;
> >			int hole = fm->fm_extents[i].fe_len & 
> >			FIEMAP_LEN_HOLE;
> >			int unwr = fm->fm_extents[i].fe_len & 
> >			FIEMAP_LEN_UNWRITTEN;
> >
> >			printf("%llu-%llu\t%llu-%llu\t%llu\t%s%s\n",
> >				fm->fm_start.fe_start, fm_next - 1,
> >				hole ? 0 : fm->fm_extents[i].fe_start,
> >				hole ? 0 : fm->fm_extents[i].fe_start +
> >					   fm->fm_extents[i].fe_len - 1,
> >				len, hole ? "(hole) " : "",
> >				unwr ? "(unwritten) " : "");
> >
> >			/* get ready for printing next extent, or next ioctl 
> >			*/
> >			fm->fm_start.fe_start = fm_next;
> >		}
> >	}
> >

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux