On 05/18/2017 04:07 AM, Darrick J. Wong wrote: > Document the new GETFSMAP ioctl that returns the physical layout of a > (disk-based) filesystem. Thanks, Darrick! Applied (with a few minor edits). (Currently sitting in a local branch, just in case anyone sends review comments that need integrating.) Cheers, Michael > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > --- > v2: emphasize that filesystems are not obligated to return inode numbers > --- > man2/ioctl_getfsmap.2 | 375 +++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 375 insertions(+) > create mode 100644 man2/ioctl_getfsmap.2 > > diff --git a/man2/ioctl_getfsmap.2 b/man2/ioctl_getfsmap.2 > new file mode 100644 > index 0000000..b451950 > --- /dev/null > +++ b/man2/ioctl_getfsmap.2 > @@ -0,0 +1,375 @@ > +.\" Copyright (c) 2017, Oracle. All rights reserved. > +.\" > +.\" %%%LICENSE_START(GPLv2+_DOC_FULL) > +.\" This is free documentation; you can redistribute it and/or > +.\" modify it under the terms of the GNU General Public License as > +.\" published by the Free Software Foundation; either version 2 of > +.\" the License, or (at your option) any later version. > +.\" > +.\" The GNU General Public License's references to "object code" > +.\" and "executables" are to be interpreted as the output of any > +.\" document formatting or typesetting system, including > +.\" intermediate and printed output. > +.\" > +.\" This manual is distributed in the hope that it will be useful, > +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of > +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +.\" GNU General Public License for more details. > +.\" > +.\" You should have received a copy of the GNU General Public > +.\" License along with this manual; if not, see > +.\" <http://www.gnu.org/licenses/>. > +.\" %%%LICENSE_END > +.TH IOCTL-GETFSMAP 2 2017-02-10 "Linux" "Linux Programmer's Manual" > +.SH NAME > +ioctl_getfsmap \- retrieve the physical layout of the filesystem > +.SH SYNOPSIS > +.br > +.B #include <sys/ioctl.h> > +.br > +.B #include <linux/fs.h> > +.br > +.B #include <linux/fsmap.h> > +.sp > +.BI "int ioctl(int " fd ", FS_IOC_GETFSMAP, struct fsmap_head * " arg ); > +.SH DESCRIPTION > +This > +.BR ioctl (2) > +retrieves physical extent mappings for a filesystem. > +This information can be used to discover which files are mapped to a physical > +block, examine free space, or find known bad blocks, among other things. > + > +The sole argument to this ioctl should be a pointer to a single > +.BR "struct fsmap_head" ":" > +.in +4n > +.nf > + > +struct fsmap { > + __u32 fmr_device; /* device id */ > + __u32 fmr_flags; /* mapping flags */ > + __u64 fmr_physical; /* device offset of segment */ > + __u64 fmr_owner; /* owner id */ > + __u64 fmr_offset; /* file offset of segment */ > + __u64 fmr_length; /* length of segment */ > + __u64 fmr_reserved[3]; /* must be zero */ > +}; > + > +struct fsmap_head { > + __u32 fmh_iflags; /* control flags */ > + __u32 fmh_oflags; /* output flags */ > + __u32 fmh_count; /* # of entries in array incl. input */ > + __u32 fmh_entries; /* # of entries filled in (output). */ > + __u64 fmh_reserved[6]; /* must be zero */ > + > + struct fsmap fmh_keys[2]; /* low and high keys for the mapping search */ > + struct fsmap fmh_recs[]; /* returned records */ > +}; > + > +.fi > +.in > +The two > +.I fmh_keys > +array elements specify the lowest and highest reverse-mapping > +keys, respectively, for which userspace would like physical mapping > +information. > +A reverse mapping key consists of the tuple (device, block, owner, offset). > +The owner and offset fields are part of the key because some filesystems > +support sharing physical blocks between multiple files and > +therefore may return multiple mappings for a given physical block. > +.PP > +Filesystem mappings are copied into the > +.I fmh_recs > +array, which immediately follows the header data. > +.SS Fields of struct fsmap_head > +.PP > +The > +.I fmh_iflags > +field is a bitmask passed to the kernel to alter the output. > +There are no flags defined, so callers must set this value to zero. > + > +.PP > +The > +.I fmh_oflags > +field is a bitmask of flags set by the kernel concerning the returned mappings. > +If > +.B FMH_OF_DEV_T > +is set, then the > +.I fmr_device > +field represents a > +.B dev_t > +structure containing the major and minor numbers of the block device. > + > +.PP > +The > +.I fmh_count > +field contains the number of elements in the array being passed to the > +kernel. > +If this value is 0, > +.I fmh_entries > +will be set to the number of records that would have been returned had > +the array been large enough; > +no mapping information will be returned. > + > +.PP > +The > +.I fmh_entries > +field contains the number of elements in the > +.I fmh_recs > +array that contain useful information. > + > +.PP > +The > +.I fmh_reserved > +fields must be set to zero. > + > +.SS Keys > +.PP > +The two key records in > +.B fsmap_head.fmh_keys > +specify the lowest and highest extent records in the keyspace that the caller > +wants returned. > +A filesystem that can share blocks between files likely requires the tuple > +.RI "(" "device" ", " "physical" ", " "owner" ", " "offset" ", " "flags" ")" > +to uniquely index any filesystem mapping record. > +Classic non-sharing filesystems might be able to identify any record with only > +.RI "(" "device" ", " "physical" ", " "flags" ")." > +For example, if the low key is set to (8:0, 36864, 0, 0, 0), the filesystem will > +only return records for extents starting at or above 36KiB on disk. > +If the high key is set to (8:0, 1048576, 0, 0, 0), only records below 1MiB will > +be returned. > +The format of > +.B fmr_device > +in the keys must match the format of the same field in the output records, > +as defined below. > +By convention, the field > +.B fsmap_head.fmh_keys[0] > +must contain the low key and > +.B fsmap_head.fmh_keys[1] > +must contain the high key for the request. > +.PP > +For convenience, if > +.B fmr_length > +is set in the low key, it will be added to > +.IR fmr_block " or " fmr_offset > +as appropriate. > +The caller can take advantage of this subtlety to set up subsequent calls > +by copying > +.B fsmap_head.fmh_recs[fsmap_head.fmh_entries - 1] > +into the low key. > +The function > +.B fsmap_advance > +provides this functionality. > + > +.SS Fields of struct fsmap > +.PP > +The > +.I fmr_device > +field uniquely identifies the underlying storage device. > +If the > +.B FMH_OF_DEV_T > +flag is set in the header's > +.I fmh_oflags > +field, this field contains a > +.B dev_t > +from which major and minor numbers can be extracted. > +If the flag is not set, this field contains a value that must be unique > +for each unique storage device. > + > +.PP > +The > +.I fmr_physical > +field contains the disk address of the extent in bytes. > + > +.PP > +The > +.I fmr_owner > +field contains the owner of the extent. > +This is an inode number unless > +.B FMR_OF_SPECIAL_OWNER > +is set in the > +.I fmr_flags > +field, in which case the value is determined by the filesystem. > +See the section below about owner values for more details. > + > +.PP > +The > +.I fmr_offset > +field contains the logical address in the mapping record in bytes. > +This field has no meaning if the > +.BR FMR_OF_SPECIAL_OWNER " or " FMR_OF_EXTENT_MAP > +flags are set in > +.IR fmr_flags "." > + > +.PP > +The > +.I fmr_length > +field contains the length of the extent in bytes. > + > +.PP > +The > +.I fmr_flags > +field is a bitmask of extent state flags. > +The bits are: > +.RS 0.4i > +.TP > +.B FMR_OF_PREALLOC > +The extent is allocated but not yet written. > +.TP > +.B FMR_OF_ATTR_FORK > +This extent contains extended attribute data. > +.TP > +.B FMR_OF_EXTENT_MAP > +This extent contains extent map information for the owner. > +.TP > +.B FMR_OF_SHARED > +Parts of this extent may be shared. > +.TP > +.B FMR_OF_SPECIAL_OWNER > +The > +.I fmr_owner > +field contains a special value instead of an inode number. > +.TP > +.B FMR_OF_LAST > +This is the last record in the filesystem. > +.RE > + > +.PP > +The > +.I fmr_reserved > +field will be set to zero. > + > +.SS Owner Values > +Generally, the value of the > +.I fmr_owner > +field for non-metadata extents should be an inode number. > +However, filesystems are under no obligation to report inode numbers; > +they may instead report > +.B FMR_OWN_UNKNOWN > +if the inode number cannot easily be retrieved, if the caller lacks > +sufficient privilege, if the filesystem does not support stable > +inode numbers, or for any other reason. > +If a filesystem wishes to condition the reporting of inode numbers based > +on process capabilities, it is strongly urged that the > +.B CAP_SYS_ADMIN > +capability be used for this purpose. > +.TP > +The following special owner values are generic to all filesystems: > +.RS 0.4i > +.TP > +.B FMR_OWN_FREE > +Free space. > +.TP > +.B FMR_OWN_UNKNOWN > +This extent is in use but its owner is not known or not easily retrieved. > +.TP > +.B FMR_OWN_METADATA > +This extent is filesystem metadata. > +.RE > + > +XFS can return the following special owner values: > +.RS 0.4i > +.TP > +.B XFS_FMR_OWN_FREE > +Free space. > +.TP > +.B XFS_FMR_OWN_UNKNOWN > +This extent is in use but its owner is not known or not easily retrieved. > +.TP > +.B XFS_FMR_OWN_FS > +Static filesystem metadata which exists at a fixed address. > +These are the AG superblock, the AGF, the AGFL, and the AGI headers. > +.TP > +.B XFS_FMR_OWN_LOG > +The filesystem journal. > +.TP > +.B XFS_FMR_OWN_AG > +Allocation group metadata, such as the free space btrees and the > +reverse mapping btrees. > +.TP > +.B XFS_FMR_OWN_INOBT > +The inode and free inode btrees. > +.TP > +.B XFS_FMR_OWN_INODES > +Inode records. > +.TP > +.B XFS_FMR_OWN_REFC > +Reference count information. > +.TP > +.B XFS_FMR_OWN_COW > +This extent is being used to stage a copy-on-write. > +.TP > +.B XFS_FMR_OWN_DEFECTIVE: > +This extent has been marked defective either by the filesystem or the > +underlying device. > +.RE > + > +ext4 can return the following special owner values: > +.RS 0.4i > +.TP > +.B EXT4_FMR_OWN_FREE > +Free space. > +.TP > +.B EXT4_FMR_OWN_UNKNOWN > +This extent is in use but its owner is not known or not easily retrieved. > +.TP > +.B EXT4_FMR_OWN_FS > +Static filesystem metadata which exists at a fixed address. > +This is the superblock and the group descriptors. > +.TP > +.B EXT4_FMR_OWN_LOG > +The filesystem journal. > +.TP > +.B EXT4_FMR_OWN_INODES > +Inode records. > +.TP > +.B EXT4_FMR_OWN_BLKBM > +Block bitmap. > +.TP > +.B EXT4_FMR_OWN_INOBM > +Inode bitmap. > +.RE > + > +.SH RETURN VALUE > +On error, \-1 is returned, and > +.I errno > +is set to indicate the error. > +.PP > +.SH ERRORS > +Error codes can be one of, but are not limited to, the following: > +.TP > +.B EINVAL > +The array is not long enough, the keys do not point to a valid part of > +the filesystem, the low key points to a higher point in the filesystem's > +physical storage address space than the high key, or a non-zero value > +was passed in one of the fields that must be zero. > +.TP > +.B EFAULT > +The pointer passed in was not mapped to a valid memory address. > +.TP > +.B EBADF > +.IR fd > +is not open for reading. > +.TP > +.B EOPNOTSUPP > +The filesystem does not support this command. > +.TP > +.B EUCLEAN > +The filesystem metadata is corrupt and needs repair. > +.TP > +.B EBADMSG > +The filesystem has detected a checksum error in the metadata. > +.TP > +.B ENOMEM > +Insufficient memory to process the request. > + > +.SH EXAMPLE > +.TP > +Please see io/fsmap.c in the xfsprogs distribution for a sample program. > + > +.SH CONFORMING TO > +This API is Linux-specific. > +Not all filesystems support it. > +.fi > +.in > +.SH SEE ALSO > +.BR ioctl (2) > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/