[PATCH 21/24] docs: add XFS directory structure to the DS&A book

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>

Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
---
 .../filesystems/xfs/ondisk/directories.rst         | 1690 ++++++++++++++++++++
 Documentation/filesystems/xfs/ondisk/dynamic.rst   |    1 
 2 files changed, 1691 insertions(+)
 create mode 100644 Documentation/filesystems/xfs/ondisk/directories.rst


diff --git a/Documentation/filesystems/xfs/ondisk/directories.rst b/Documentation/filesystems/xfs/ondisk/directories.rst
new file mode 100644
index 000000000000..b2de94a6aae8
--- /dev/null
+++ b/Documentation/filesystems/xfs/ondisk/directories.rst
@@ -0,0 +1,1690 @@
+.. SPDX-License-Identifier: CC-BY-SA-3.0+
+
+Directories
+-----------
+
+    **Note**
+
+    Only v2 directories covered here. v1 directories are obsolete.
+
+    **Note**
+
+    The term "block" in this section will refer to directory blocks, not
+    filesystem blocks unless otherwise specified.
+
+The size of a "directory block" is defined by the
+`superblock’s <#superblocks>`__ sb\_dirblklog value. The size in bytes =
+sb\_blocksize × 2\ :sup:`sb\_dirblklog`. For example, if sb\_blocksize = 4096
+and sb\_dirblklog = 2, the directory block size is 16384 bytes. Directory
+blocks are always allocated in multiples based on sb\_dirblklog. Directory
+blocks cannot be more that 65536 bytes in size.
+
+All directory entries contain the following "data":
+
+-  The entry’s name (counted string consisting of a single byte namelen
+   followed by name consisting of an array of 8-bit chars without a NULL
+   terminator).
+
+-  The entry’s absolute `inode number <#inode-numbers>`__, which are always 64
+   bits (8 bytes) in size except a special case for shortform directories.
+
+-  An offset or tag used for iterative readdir calls.
+
+-  If the XFS\_SB\_FEAT\_INCOMPAT\_FTYPE feature flag is set, each directory
+   entry contains an ftype field that caches the inode’s type to avoid having
+   to perform an inode lookup.
+
+.. list-table::
+   :widths: 28 52
+   :header-rows: 1
+
+   * - Flag
+     - Description
+
+   * - XFS_DIR3_FT_UNKNOWN
+     - Entry points to an unknown inode type.  This should never appear on
+       disk.
+
+   * - XFS_DIR3_FT_REG_FILE
+     - Entry points to a file.
+
+   * - XFS_DIR3_FT_DIR
+     - Entry points to another directory.
+
+   * - XFS_DIR3_FT_CHRDEV
+     - Entry points to a character device.
+
+   * - XFS_DIR3_FT_BLKDEV
+     - Entry points to a block device.
+
+   * - XFS_DIR3_FT_FIFO
+     - Entry points to a FIFO.
+
+   * - XFS_DIR3_FT_SOCK
+     - Entry points to a socket.
+
+   * - XFS_DIR3_FT_SYMLINK
+     - Entry points to a symbolic link.
+
+   * - XFS_DIR3_FT_WHT
+     - Entry points to an overlayfs whiteout file.  This (as far as the author
+       knows) has never appeared on disk.
+
+Table: ftype values
+
+All non-shortform directories also contain two additional structures:
+"leaves"
+and "freespace indexes".
+
+-  Leaves contain the sorted hashed name value (xfs\_da\_hashname() in
+   xfs\_da\_btree.c) and associated "address" which points to the
+   effective offset into the directory’s data structures. Leaves are used to
+   optimise lookup operations.
+
+-  Freespace indexes contain free space/empty entry tracking for quickly
+   finding an appropriately sized location for new entries. They maintain the
+   largest free space for each "data" block.
+
+A few common types are used for the directory structures:
+
+.. code:: c
+
+    typedef __uint16_t xfs_dir2_data_off_t;
+    typedef __uint32_t xfs_dir2_dataptr_t;
+
+Short Form Directories
+~~~~~~~~~~~~~~~~~~~~~~
+
+-  Directory entries are stored within the inode.
+
+-  The only data stored is the name, inode number, and offset. No "leaf" or
+   "freespace index" information is required as an inode can only store a
+   few entries.
+
+-  "." is not stored (as it’s in the inode itself), and ".." is a
+   dedicated parent field in the header.
+
+-  The number of directories that can be stored in an inode depends on the
+   `inode <#on-disk-inode>`__ size, the number of entries, the length of the
+   entry names, and extended attribute data.
+
+-  Once the number of entries exceeds the space available in the inode, the
+   format is converted to a `block directory <#block-directories>`__.
+
+-  Shortform directory data is packed as tightly as possible on the disk with
+   the remaining space zeroed:
+
+.. code:: c
+
+    typedef struct xfs_dir2_sf {
+         xfs_dir2_sf_hdr_t         hdr;
+         xfs_dir2_sf_entry_t       list[1];
+    } xfs_dir2_sf_t;
+
+**hdr**
+    Short form directory header.
+
+**list**
+    An array of variable-length directory entry records.
+
+.. code:: c
+
+    typedef struct xfs_dir2_sf_hdr {
+         __uint8_t                 count;
+         __uint8_t                 i8count;
+         xfs_dir2_inou_t           parent;
+    } xfs_dir2_sf_hdr_t;
+
+**count**
+    Number of directory entries.
+
+**i8count**
+    Number of directory entries requiring 64-bit entries, if any inode numbers
+    require 64-bits. Zero otherwise.
+
+**parent**
+    The absolute inode number of this directory’s parent.
+
+.. code:: c
+
+    typedef struct xfs_dir2_sf_entry {
+         __uint8_t                 namelen;
+         xfs_dir2_sf_off_t         offset;
+         __uint8_t                 name[1];
+         __uint8_t                 ftype;
+         xfs_dir2_inou_t           inumber;
+    } xfs_dir2_sf_entry_t;
+
+**namelen**
+    Length of the name, in bytes.
+
+**offset**
+    Offset tag used to assist with directory iteration.
+
+**name**
+    The name of the directory entry. The entry is not NULL-terminated.
+
+**ftype**
+    The type of the inode. This is used to avoid reading the inode while
+    iterating a directory. The XFS\_SB\_VERSION2\_FTYPE feature must be set,
+    or this field will not be present.
+
+**inumber**
+    The inode number that this entry points to. The length is either 32 or 64
+    bits, depending on whether icount or i8count, respectively, are set in the
+    header.
+
+.. figure:: images/39.png
+   :alt: Short form directory layout
+
+   Short form directory layout
+
+-  Inode numbers are stored using 4 or 8 bytes depending on whether all the
+   inode numbers for the directory fit in 4 bytes (32 bits) or not. If all
+   inode numbers fit in 4 bytes, the header’s count value specifies the number
+   of entries in the directory and i8count will be zero. If any inode number
+   exceeds 4 bytes, all inode numbers will be 8 bytes in size and the header’s
+   i8count value specifies the number of entries requiring larger inodes.
+   i4count is still the number of entries. The following union covers the
+   shortform inode number structure:
+
+.. code:: c
+
+    typedef struct { __uint8_t i[8]; } xfs_dir2_ino8_t;
+    typedef struct { __uint8_t i[4]; } xfs_dir2_ino4_t;
+    typedef union {
+         xfs_dir2_ino8_t           i8;
+         xfs_dir2_ino4_t           i4;
+    } xfs_dir2_inou_t;
+
+xfs\_db Short Form Directory Example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A directory is created with 4 files, all inode numbers fitting within 4 bytes:
+
+::
+
+    xfs_db> inode <inode#>
+    xfs_db> p
+    core.magic = 0x494e
+    core.mode = 040755
+    core.version = 1
+    core.format = 1 (local)
+    core.nlinkv1 = 2
+    ...
+    core.size = 94
+    core.nblocks = 0
+    core.extsize = 0
+    core.nextents = 0
+    ...
+    u.sfdir2.hdr.count = 4
+    u.sfdir2.hdr.i8count = 0
+    u.sfdir2.hdr.parent.i4 = 128              /* parent = root inode */
+    u.sfdir2.list[0].namelen = 15
+    u.sfdir2.list[0].offset = 0x30
+    u.sfdir2.list[0].name = "frame000000.tst"
+    u.sfdir2.list[0].inumber.i4 = 25165953
+    u.sfdir2.list[1].namelen = 15
+    u.sfdir2.list[1].offset = 0x50
+    u.sfdir2.list[1].name = "frame000001.tst"
+    u.sfdir2.list[1].inumber.i4 = 25165954
+    u.sfdir2.list[2].namelen = 15
+    u.sfdir2.list[2].offset = 0x70
+    u.sfdir2.list[2].name = "frame000002.tst"
+    u.sfdir2.list[2].inumber.i4 = 25165955
+    u.sfdir2.list[3].namelen = 15
+    u.sfdir2.list[3].offset = 0x90
+    u.sfdir2.list[3].name = "frame000003.tst"
+    u.sfdir2.list[3].inumber.i4 = 25165956
+
+The raw data on disk with the first entry highlighted. The six byte header
+precedes the first entry:
+
+::
+
+    xfs_db> type text
+    xfs_db> p
+    00: 49 4e 41 ed 01 01 00 02 00 00 00 00 00 00 00 00 INA.............
+    10: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 02 ................
+    20: 44 ad 3a 83 1d a9 4a d0 44 ad 3a ab 0b c7 a7 d0 D.....J.D.......
+    30: 44 ad 3a ab 0b c7 a7 d0 00 00 00 00 00 00 00 5e D...............
+    40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+    50: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 ................
+    60: ff ff ff ff 04 00 00 00 00 80 0f 00 30 66 72 61 ............0fra
+    70: 6d 65 30 30 30 30 30 30 2e 74 73 74 01 80 00 81 me000000.tst....
+    80: 0f 00 50 66 72 61 6d 65 30 30 30 30 30 31 2e 74 ..Pframe000001.t
+    90: 73 74 01 80 00 82 0f 00 70 66 72 61 6d 65 30 30 st......pframe00
+    a0: 30 30 30 32 2e 74 73 74 01 80 00 83 0f 00 90 66 0002.tst........
+    b0: 72 61 6d 65 30 30 30 30 30 33 2e 74 73 74 01 80 rame000003.tst..
+    cO: 00 84 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+
+Next, an entry is deleted (frame000001.tst), and any entries after the deleted
+entry are moved or compacted to "cover" the hole:
+
+::
+
+    xfs_db> inode <inode#>
+    xfs_db> p
+    core.magic = 0x494e
+    core.mode = 040755
+    core.version = 1
+    core.format = 1 (local)
+    core.nlinkv1 = 2
+    ...
+    core.size = 72
+    core.nblocks = 0
+    core.extsize = 0
+    core.nextents = 0
+    ...
+    u.sfdir2.hdr.count = 3
+    u.sfdir2.hdr.i8count = 0
+    u.sfdir2.hdr.parent.i4 = 128
+    u.sfdir2.list[0].namelen = 15
+    u.sfdir2.list[0].offset = 0x30
+    u.sfdir2.list[0].name = "frame000000.tst"
+    u.sfdir2.list[0].inumber.i4 = 25165953
+    u.sfdir2.list[1].namelen = 15
+    u.sfdir2.list[1].offset = 0x70
+    u.sfdir2.list[1].name = "frame000002.tst"
+    u.sfdir2.list[1].inumber.i4 = 25165955
+    u.sfdir2.list[2].namelen = 15
+    u.sfdir2.list[2].offset = 0x90
+    u.sfdir2.list[2].name = "frame000003.tst"
+    u.sfdir2.list[2].inumber.i4 = 25165956
+
+Raw disk data, the space beyond the shortform entries is invalid and could be
+non-zero:
+
+::
+
+    xfs_db> type text
+    xfs_db> p
+    00: 49  4e 41 ed 01 01 00 02 00 00 00 00 00 00 00 00 INA.............
+    10: 00  00 00 02 00 00 00 00 00 00 00 00 00 00 00 03 ................
+    20: 44  b2 45 a2 09 fd e4 50 44 b2 45 a3 12 ee b5 d0 D.E....PD.E.....
+    30: 44  b2 45 a3 12 ee b5 d0 00 00 00 00 00 00 00 48 D.E............H
+    40: 00  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+    50: 00  00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 ................
+    60: ff  ff ff ff 03 00 00 00 00 80 0f 00 30 66 72 61 ............0fra
+    70: 6d  65 30 30 30 30 30 30 2e 74 73 74 01 80 00 81 me000000.tst....
+    80: 0f  00 70 66 72 61 6d 65 30 30 30 30 30 32 2e 74 ..pframe000002.t
+    90: 73  74 01 80 00 83 0f 00 90 66 72 61 6d 65 30 30 st.......frame00
+    a0: 30  30 30 33 2e 74 73 74 01 80 00 84 0f 00 90 66 0003.tst.......f
+    b0: 72  61 6d 65 30 30 30 30 30 33 2e 74 73 74 01 80 rame000003.tst..
+    c0: 00  84 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
+
+This is an example of mixed 4-byte and 8-byte inodes in a directory:
+
+::
+
+    xfs_db> inode 1024
+    xfs_db> p
+    core.magic = 0x494e
+    core.mode = 040755
+    core.version = 3
+    core.format = 1 (local)
+    core.nlinkv2 = 9
+    ...
+    core.size = 125
+    core.nblocks = 0
+    core.extsize = 0
+    core.nextents = 0
+    ...
+    u3.sfdir3.hdr.count = 7
+    u3.sfdir3.hdr.i8count = 4
+    u3.sfdir3.hdr.parent.i8 = 1024
+    u3.sfdir3.list[0].namelen = 3
+    u3.sfdir3.list[0].offset = 0x60
+    u3.sfdir3.list[0].name = "git"
+    u3.sfdir3.list[0].inumber.i8 = 1027
+    u3.sfdir3.list[0].filetype = 2
+    u3.sfdir3.list[1].namelen = 4
+    u3.sfdir3.list[1].offset = 0x70
+    u3.sfdir3.list[1].name = "home"
+    u3.sfdir3.list[1].inumber.i8 = 13422826546
+    u3.sfdir3.list[1].filetype = 2
+    u3.sfdir3.list[2].namelen = 10
+    u3.sfdir3.list[2].offset = 0x80
+    u3.sfdir3.list[2].name = "mike"
+    u3.sfdir3.list[2].inumber.i8 = 4299308032
+    u3.sfdir3.list[2].filetype = 2
+    u3.sfdir3.list[3].namelen = 3
+    u3.sfdir3.list[3].offset = 0x98
+    u3.sfdir3.list[3].name = "mtr"
+    u3.sfdir3.list[3].inumber.i8 = 13433252916
+    u3.sfdir3.list[3].filetype = 2
+    u3.sfdir3.list[4].namelen = 3
+    u3.sfdir3.list[4].offset = 0xa8
+    u3.sfdir3.list[4].name = "vms"
+    u3.sfdir3.list[4].inumber.i8 = 16647516355
+    u3.sfdir3.list[4].filetype = 2
+    u3.sfdir3.list[5].namelen = 5
+    u3.sfdir3.list[5].offset = 0xb8
+    u3.sfdir3.list[5].name = "rsync"
+    u3.sfdir3.list[5].inumber.i8 = 3494912
+    u3.sfdir3.list[5].filetype = 2
+    u3.sfdir3.list[6].namelen = 3
+    u3.sfdir3.list[6].offset = 0xd0
+    u3.sfdir3.list[6].name = "tmp"
+    u3.sfdir3.list[6].inumber.i8 = 1593379
+    u3.sfdir3.list[6].filetype = 2
+
+Block Directories
+~~~~~~~~~~~~~~~~~
+
+When the shortform directory space exceeds the space in an inode, the
+directory data is moved into a new single directory block outside the inode.
+The inode’s format is changed from "local" to "extent" Following is a
+list of points about block directories.
+
+-  All directory data is stored within the one directory block, including
+   "." and
+   ".." entries which are mandatory.
+
+-  The block also contains "leaf" and "freespace index" information.
+
+-  The location of the block is defined by the inode’s in-core `extent
+   list <#extent-list>`__: the di\_u.u\_bmx[0] value. The file offset in the
+   extent must always be zero and the length = (directory block size /
+   filesystem block size). The block number points to the filesystem block
+   containing the directory data.
+
+-  Block directory data is stored in the following structures:
+
+.. code:: c
+
+    #define XFS_DIR2_DATA_FD_COUNT 3
+    typedef struct xfs_dir2_block {
+         xfs_dir2_data_hdr_t        hdr;
+         xfs_dir2_data_union_t      u[1];
+         xfs_dir2_leaf_entry_t      leaf[1];
+         xfs_dir2_block_tail_t      tail;
+    } xfs_dir2_block_t;
+
+**hdr**
+    Directory block header. On a v5 filesystem this is
+    xfs\_dir3\_data\_hdr\_t.
+
+**u**
+    Union of directory and unused entries.
+
+**leaf**
+    Hash values of the entries in this block.
+
+**tail**
+    Bookkeeping for the leaf entries.
+
+.. code:: c
+
+    typedef struct xfs_dir2_data_hdr {
+         __uint32_t                 magic;
+         xfs_dir2_data_free_t       bestfree[XFS_DIR2_DATA_FD_COUNT];
+    } xfs_dir2_data_hdr_t;
+
+**magic**
+    Magic number for this directory block.
+
+**bestfree**
+    An array pointing to free regions in the directory block.
+
+On a v5 filesystem, directory and attribute blocks are formatted with v3
+headers, which contain extra data:
+
+.. code:: c
+
+    struct xfs_dir3_blk_hdr {
+         __be32                     magic;
+         __be32                     crc;
+         __be64                     blkno;
+         __be64                     lsn;
+         uuid_t                     uuid;
+         __be64                     owner;
+    };
+
+**magic**
+    Magic number for this directory block.
+
+**crc**
+    Checksum of the directory block.
+
+**blkno**
+    Block number of this directory block.
+
+**lsn**
+    Log sequence number of the last write to this block.
+
+**uuid**
+    The UUID of this block, which must match either sb\_uuid or sb\_meta\_uuid
+    depending on which features are set.
+
+**owner**
+    The inode number that this directory block belongs to.
+
+.. code:: c
+
+    struct xfs_dir3_data_hdr {
+         struct xfs_dir3_blk_hdr    hdr;
+         xfs_dir2_data_free_t       best_free[XFS_DIR2_DATA_FD_COUNT];
+         __be32                     pad;
+    };
+
+**hdr**
+    The v5 directory/attribute block header.
+
+**best\_free**
+    An array pointing to free regions in the directory block.
+
+**pad**
+    Padding to maintain a 64-bit alignment.
+
+Within the block, data structures are as follows:
+
+.. code:: c
+
+    typedef struct xfs_dir2_data_free {
+         xfs_dir2_data_off_t        offset;
+         xfs_dir2_data_off_t        length;
+    } xfs_dir2_data_free_t;
+    ----
+
+    *offset*::
+    Block offset of a free block, in bytes.
+
+    *length*::
+    Length of the free block, in bytes.
+
+    Space inside the directory block can be used for directory entries or unused
+    entries.  This is signified via a union of the two types:
+
+    [source, c]
+
+typedef union { xfs\_dir2\_data\_entry\_t entry; xfs\_dir2\_data\_unused\_t
+unused; } xfs\_dir2\_data\_union\_t;
+
+::
+
+    *entry*::
+    A directory entry.
+
+    *unused*::
+    An unused entry.
+
+    [source, c]
+    -----
+    typedef struct xfs_dir2_data_entry {
+         xfs_ino_t                  inumber;
+         __uint8_t                  namelen;
+         __uint8_t                  name[1];
+         __uint8_t                  ftype;
+         xfs_dir2_data_off_t        tag;
+    } xfs_dir2_data_entry_t;
+
+**inumber**
+    The inode number that this entry points to.
+
+**namelen**
+    Length of the name, in bytes.
+
+**name**
+    The name associated with this entry.
+
+**ftype**
+    The type of the inode. This is used to avoid reading the inode while
+    iterating a directory. The XFS\_SB\_VERSION2\_FTYPE feature must be set,
+    or this field will not be present.
+
+**tag**
+    Starting offset of the entry, in bytes. This is used for directory
+    iteration.
+
+.. code:: c
+
+    typedef struct xfs_dir2_data_unused {
+         __uint16_t                 freetag;  /* 0xffff */
+         xfs_dir2_data_off_t        length;
+         xfs_dir2_data_off_t        tag;
+    } xfs_dir2_data_unused_t;
+    ----
+
+    *freetag*::
+    Magic number signifying that this is an unused entry.  Must be 0xFFFF.
+
+    *length*::
+    Length of this unused entry, in bytes.
+
+    *tag*::
+    Starting offset of the entry, in bytes.
+
+    [source, c]
+
+typedef struct xfs\_dir2\_leaf\_entry { xfs\_dahash\_t hashval;
+xfs\_dir2\_dataptr\_t address; } xfs\_dir2\_leaf\_entry\_t;
+
+::
+
+    *hashval*::
+    Hash value of the name of the directory entry.  This is used to speed up entry
+    lookups.
+
+    *address*::
+    Block offset of the entry, in eight byte units.
+
+    [source, c]
+    -----
+    typedef struct xfs_dir2_block_tail {
+         __uint32_t                 count;
+         __uint32_t                 stale;
+    } xfs_dir2_block_tail_t;
+
+**count**
+    Number of leaf entries.
+
+**stale**
+    Number of free leaf entries.
+
+Following is a diagram of how these pieces fit together for a block directory.
+
+.. ifconfig:: builder != 'latex'
+
+   .. figure:: images/43.png
+      :alt: Block directory layout
+
+      Block directory layout
+
+.. ifconfig:: builder == 'latex'
+
+   .. figure:: images/43.png
+      :scale: 45%
+      :alt: Block directory layout
+
+      Block directory layout
+
+-  The magic number in the header is "XD2B" (0x58443242), or "XDB3"
+   (0x58444233) on a v5 filesystem.
+
+-  The tag in the xfs\_dir2\_data\_entry\_t structure stores its offset from
+   the start of the block.
+
+-  The start of a free space region is marked with the
+   xfs\_dir2\_data\_unused\_t structure where the freetag is 0xffff. The
+   freetag and length overwrites the inumber for an entry. The tag is located
+   at length - sizeof(tag) from the start of the unused entry on-disk.
+
+-  The bestfree array in the header points to as many as three of the largest
+   spaces of free space within the block for storing new entries sorted by
+   largest to third largest. If there are less than 3 empty regions, the
+   remaining bestfree elements are zeroed. The offset specifies the offset
+   from the start of the block in bytes, and the length specifies the size of
+   the free space in bytes. The location each points to must contain the above
+   xfs\_dir2\_data\_unused\_t structure. As a block cannot exceed 64KB in
+   size, each is a 16-bit value. bestfree is used to optimise the time
+   required to locate space to create an entry. It saves scanning through the
+   block to find a location suitable for every entry created.
+
+-  The tail structure specifies the number of elements in the leaf array and
+   the number of stale entries in the array. The tail is always located at the
+   end of the block. The leaf data immediately precedes the tail structure.
+
+-  The leaf array, which grows from the end of the block just before the tail
+   structure, contains an array of hash/address pairs for quickly looking up a
+   name by a hash value. Hash values are covered by the introduction to
+   directories. The address on-disk is the offset into the block divided by 8
+   (XFS\_DIR2\_DATA\_ALIGN). Hash/address pairs are stored on disk to optimise
+   lookup speed for large directories. If they were not stored, the hashes
+   would have to be calculated for all entries each time a lookup occurs in a
+   directory.
+
+xfs\_db Block Directory Example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A directory is created with 8 entries, directory block size = filesystem block
+size:
+
+::
+
+    xfs_db> sb 0
+    xfs_db> p
+    magicnum = 0x58465342
+    blocksize = 4096
+    ...
+    dirblklog = 0
+    ...
+    xfs_db> inode <inode#>
+    xfs_db> p
+    core.magic = 0x494e
+    core.mode = 040755
+    core.version = 1
+    core.format = 2 (extents)
+    core.nlinkv1 = 2
+    ...
+    core.size = 4096
+    core.nblocks = 1
+    core.extsize = 0
+    core.nextents = 1
+    ...
+    u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,2097164,1,0]
+
+Go to the "startblock" and show the raw disk data:
+
+::
+
+    xfs_db> dblock 0
+    xfs_db> type text
+    xfs_db> p
+    000: 58 44 32 42 01 30 0e 78 00 00 00 00 00 00 00 00 XD2B.0.x........
+    010: 00 00 00 00 02 00 00 80 01 2e 00 00 00 00 00 10 ................
+    020: 00 00 00 00 00 00 00 80 02 2e 2e 00 00 00 00 20 ................
+    030: 00 00 00 00 02 00 00 81 0f 66 72 61 6d 65 30 30 .........frame00
+    040: 30 30 30 30 2e 74 73 74 80 8e 59 00 00 00 00 30 0000.tst..Y....0
+    050: 00 00 00 00 02 00 00 82 0f 66 72 61 6d 65 30 30 .........frame00
+    060: 30 30 30 31 2e 74 73 74 d0 ca 5c 00 00 00 00 50 0001.tst.......P
+    070: 00 00 00 00 02 00 00 83 0f 66 72 61 6d 65 30 30 .........frame00
+    080: 30 30 30 32 2e 74 73 74 00 00 00 00 00 00 00 70 0002.tst.......p
+    090: 00 00 00 00 02 00 00 84 0f 66 72 61 6d 65 30 30 .........frame00
+    0a0: 30 30 30 33 2e 74 73 74 00 00 00 00 00 00 00 90 0003.tst........
+    0b0: 00 00 00 00 02 00 00 85 0f 66 72 61 6d 65 30 30 .........frame00
+    0c0: 30 30 30 34 2e 74 73 74 00 00 00 00 00 00 00 b0 0004.tst........
+    0d0: 00 00 00 00 02 00 00 86 0f 66 72 61 6d 65 30 30 .........frame00
+    0e0: 30 30 30 35 2e 74 73 74 00 00 00 00 00 00 00 d0 0005.tst........
+    0f0: 00 00 00 00 02 00 00 87 0f 66 72 61 6d 65 30 30 .........frame00
+    100: 30 30 30 36 2e 74 73 74 00 00 00 00 00 00 00 f0 0006.tst........
+    110: 00 00 00 00 02 00 00 88 0f 66 72 61 6d 65 30 30 .........frame00
+    120: 30 30 30 37 2e 74 73 74 00 00 00 00 00 00 01 10 0007.tst........
+    130: ff ff 0e 78 00 00 00 00 00 00 00 00 00 00 00 00 ...x............
+
+The "leaf" and "tail" structures are stored at the end of the block, so
+as the directory grows, the middle is filled in:
+
+::
+
+    fa0: 00 00 00 00 00 00 01 30 00 00 00 2e 00 00 00 02 .......0........
+    fb0: 00 00 17 2e 00 00 00 04 83 a0 40 b4 00 00 00 0e ................
+    fc0: 93 a0 40 b4 00 00 00 12 a3 a0 40 b4 00 00 00 06 ................
+    fd0: b3 a0 40 b4 00 00 00 0a c3 a0 40 b4 00 00 00 1e ................
+    fe0: d3 a0 40 b4 00 00 00 22 e3 a0 40 b4 00 00 00 16 ................
+    ff0: f3 a0 40 b4 00 00 00 1a 00 00 00 0a 00 00 00 00 ................
+
+In a readable format:
+
+::
+
+    xfs_db> type dir2
+    xfs_db> p
+    bhdr.magic = 0x58443242
+    bhdr.bestfree[0].offset = 0x130
+    bhdr.bestfree[0].length = 0xe78
+    bhdr.bestfree[1].offset = 0
+    bhdr.bestfree[1].length = 0
+    bhdr.bestfree[2].offset = 0
+    bhdr.bestfree[2].length = 0
+    bu[0].inumber = 33554560
+    bu[0].namelen = 1
+    bu[0].name = "."
+    bu[0].tag = 0x10
+    bu[1].inumber = 128
+    bu[1].namelen = 2
+    bu[1].name = ".."
+    bu[1].tag = 0x20
+    bu[2].inumber = 33554561
+    bu[2].namelen = 15
+    bu[2].name = "frame000000.tst"
+    bu[2].tag = 0x30
+    bu[3].inumber = 33554562
+    bu[3].namelen = 15
+    bu[3].name = "frame000001.tst"
+    bu[3].tag = 0x50
+    ...
+    bu[8].inumber = 33554567
+    bu[8].namelen = 15
+    bu[8].name = "frame000006.tst"
+    bu[8].tag = 0xf0
+    bu[9].inumber = 33554568
+    bu[9].namelen = 15
+    bu[9].name = "frame000007.tst"
+    bu[9].tag = 0x110
+    bu[10].freetag = 0xffff
+    bu[10].length = 0xe78
+    bu[10].tag = 0x130
+    bleaf[0].hashval = 0x2e
+    bleaf[0].address = 0x2
+    bleaf[1].hashval = 0x172e
+    bleaf[1].address = 0x4
+    bleaf[2].hashval = 0x83a040b4
+    bleaf[2].address = 0xe
+    ...
+    bleaf[8].hashval = 0xe3a040b4
+    bleaf[8].address = 0x16
+    bleaf[9].hashval = 0xf3a040b4
+    bleaf[9].address = 0x1a
+    btail.count = 10
+    btail.stale = 0
+
+    **Note**
+
+    For block directories, all xfs\_db fields are preceded with "b".
+
+For a simple lookup example, the hash of frame000000.tst is 0xb3a040b4.
+Looking up that value, we get an address of 0x6. Multiply that by 8, it
+becomes offset 0x30 and the inode at that point is 33554561.
+
+When we remove an entry from the middle (frame000004.tst), we can see how the
+freespace details are adjusted:
+
+::
+
+    bhdr.magic = 0x58443242
+    bhdr.bestfree[0].offset = 0x130
+    bhdr.bestfree[0].length = 0xe78
+    bhdr.bestfree[1].offset = 0xb0
+    bhdr.bestfree[1].length = 0x20
+    bhdr.bestfree[2].offset = 0
+    bhdr.bestfree[2].length = 0
+    ...
+    bu[5].inumber = 33554564
+    bu[5].namelen = 15
+    bu[5].name = "frame000003.tst"
+    bu[5].tag = 0x90
+    bu[6].freetag = 0xffff
+    bu[6].length = 0x20
+    bu[6].tag = 0xb0
+    bu[7].inumber = 33554566
+    bu[7].namelen = 15
+    bu[7].name = "frame000005.tst"
+    bu[7].tag = 0xd0
+    ...
+    bleaf[7].hashval = 0xd3a040b4
+    bleaf[7].address = 0x22
+    bleaf[8].hashval = 0xe3a040b4
+    bleaf[8].address = 0
+    bleaf[9].hashval = 0xf3a040b4
+    bleaf[9].address = 0x1a
+    btail.count = 10
+    btail.stale = 1
+
+A new "bestfree" value is added for the entry, the start of the entry is
+marked as unused with 0xffff (which overwrites the inode number for an actual
+entry), and the length of the space. The tag remains intact at the
+offset+length - sizeof(tag). The address for the hash is also cleared. The
+affected areas are highlighted below:
+
+::
+
+    090: 00 00 00 00 02 00 00 84 0f 66 72 61 6d 65 30 30 ..........frame00
+    0a0: 30 30 30 33 2e 74 73 74 00 00 00 00 00 00 00 90 0003.tst.........
+    0b0: ff ff 00 20 02 00 00 85 0f 66 72 61 6d 65 30 30 ..........frame00
+    0c0: 30 30 30 34 2e 74 73 74 00 00 00 00 00 00 00 b0 0004.tst.........
+    0d0: 00 00 00 00 02 00 00 86 0f 66 72 61 6d 65 30 30 ..........frame00
+    0e0: 30 30 30 35 2e 74 73 74 00 00 00 00 00 00 00 0d 0005.tst.........
+    ...
+    fb0: 00 00 17 2e 00 00 00 04 83 a0 40 b4 00 00 00 0e .................
+    fc0: 93 a0 40 b4 00 00 00 12 a3 a0 40 b4 00 00 00 06 .................
+    fd0: b3 a0 40 b4 00 00 00 0a c3 a0 40 b4 00 00 00 1e .................
+    fe0: d3 a0 40 b4 00 00 00 22 e3 a0 40 b4 00 00 00 00 .................
+    ff0: f3 a0 40 b4 00 00 00 1a 00 00 00 0a 00 00 00 01 .................
+
+Leaf Directories
+~~~~~~~~~~~~~~~~
+
+Once a Block Directory has filled the block, the directory data is changed
+into a new format. It still uses `extents <#data-extents>`__ and the same
+basic structures, but the "data" and "leaf" are split up into their own
+extents. The "leaf" information only occupies one extent. As "leaf"
+information is more compact than
+"data" information, more than one "data" extent is common.
+
+-  Block to Leaf conversions retain the existing block for the data entries
+   and allocate a new block for the leaf and freespace index information.
+
+-  As with all directories, data blocks must start at logical offset zero.
+
+-  The "leaf" block has a special offset defined by
+   XFS\_DIR2\_LEAF\_OFFSET. Currently, this is 32GB and in the extent view, a
+   block offset of 32GB / sb\_blocksize. On a 4KB block filesystem, this is
+   0x800000 (8388608 decimal).
+
+-  Blocks with directory entries
+   ("data" extents) have the magic number "X2D2" (0x58443244), or
+   "XDD3" (0x58444433) on a v5 filesystem.
+
+-  The "data" extents have a new header (no "leaf" data):
+
+.. code:: c
+
+    typedef struct xfs_dir2_data {
+         xfs_dir2_data_hdr_t       hdr;
+         xfs_dir2_data_union_t     u[1];
+    } xfs_dir2_data_t;
+
+**hdr**
+    Data block header. On a v5 filesystem, this field is struct
+    xfs\_dir3\_data\_hdr.
+
+**u**
+    Union of directory and unused entries, exactly the same as in a block
+    directory.
+
+-  The "leaf" extent uses the following structures:
+
+.. code:: c
+
+    typedef struct xfs_dir2_leaf {
+         xfs_dir2_leaf_hdr_t       hdr;
+         xfs_dir2_leaf_entry_t     ents[1];
+         xfs_dir2_data_off_t       bests[1];
+         xfs_dir2_leaf_tail_t      tail;
+    } xfs_dir2_leaf_t;
+
+**hdr**
+    Directory leaf header. On a v5 filesystem this is struct
+    xfs\_dir3\_leaf\_hdr\_t.
+
+**ents**
+    Hash values of the entries in this block.
+
+**bests**
+    An array pointing to free regions in the directory block.
+
+**tail**
+    Bookkeeping for the leaf entries.
+
+.. code:: c
+
+    typedef struct xfs_dir2_leaf_hdr {
+         xfs_da_blkinfo_t          info;
+         __uint16_t                count;
+         __uint16_t                stale;
+    } xfs_dir2_leaf_hdr_t;
+
+**info**
+    Leaf btree block header.
+
+**count**
+    Number of leaf entries.
+
+**stale**
+    Number of stale/zeroed leaf entries.
+
+.. code:: c
+
+    struct xfs_dir3_leaf_hdr {
+         struct xfs_da3_blkinfo    info;
+         __uint16_t                count;
+         __uint16_t                stale;
+         __be32                    pad;
+    };
+
+**info**
+    Leaf B+tree block header.
+
+**count**
+    Number of leaf entries.
+
+**stale**
+    Number of stale/zeroed leaf entries.
+
+**pad**
+    Padding to maintain alignment rules.
+
+.. code:: c
+
+    typedef struct xfs_dir2_leaf_tail {
+         __uint32_t                bestcount;
+    } xfs_dir2_leaf_tail_t;
+
+**bestcount**
+    Number of best free entries.
+
+-  The magic number of the leaf block is XFS\_DIR2\_LEAF1\_MAGIC (0xd2f1); on
+   a v5 filesystem it is XFS\_DIR3\_LEAF1\_MAGIC (0x3df1).
+
+-  The size of the ents array is specified by hdr.count.
+
+-  The size of the bests array is specified by the tail.bestcount, which is
+   also the number of "data" blocks for  the directory. The bests array
+   maintains each data block’s bestfree[0].length value.
+
+.. ifconfig:: builder != 'latex'
+
+   .. figure:: images/48.png
+      :alt: Leaf directory free entry detail
+
+      Leaf directory free entry detail
+
+.. ifconfig:: builder == 'latex'
+
+   .. figure:: images/48.png
+      :scale: 40%
+      :alt: Leaf directory free entry detail
+
+      Leaf directory free entry detail
+
+xfs\_db Leaf Directory Example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For this example, a directory was created with 256 entries (frame000000.tst to
+frame000255.tst). Some files were deleted (frame00005\*, frame00018\* and
+frame000240.tst) to show free list characteristics.
+
+::
+
+    xfs_db> inode <inode#>
+    xfs_db> p
+    core.magic = 0x494e
+    core.mode = 040755
+    core.version = 1
+    core.format = 2 (extents)
+    core.nlinkv1 = 2
+    ...
+    core.size = 12288
+    core.nblocks = 4
+    core.extsize = 0
+    core.nextents = 3
+    ...
+    u.bmx[0-2] = [startoff,startblock,blockcount,extentflag]
+              0:[0,4718604,1,0]
+              1:[1,4718610,2,0]
+              2:[8388608,4718605,1,0]
+
+As can be seen in this example, three blocks are used for
+"data" in two extents,
+and the "leaf" extent has a logical offset of 8388608 blocks (32GB).
+
+Examining the first block:
+
+::
+
+    xfs_db> dblock 0
+    xfs_db> type dir2
+    xfs_db> p
+    dhdr.magic = 0x58443244
+    dhdr.bestfree[0].offset = 0x670
+    dhdr.bestfree[0].length = 0x140
+    dhdr.bestfree[1].offset = 0xff0
+    dhdr.bestfree[1].length = 0x10
+    dhdr.bestfree[2].offset = 0
+    dhdr.bestfree[2].length = 0
+    du[0].inumber = 75497600
+    du[0].namelen = 1
+    du[0].name = "."
+    du[0].tag = 0x10
+    du[1].inumber = 128
+    du[1].namelen = 2
+    du[1].name = ".."
+    du[1].tag = 0x20
+    du[2].inumber = 75497601
+    du[2].namelen = 15
+    du[2].name = "frame000000.tst"
+    du[2].tag = 0x30
+    du[3].inumber = 75497602
+    du[3].namelen = 15
+    du[3].name = "frame000001.tst"
+    du[3].tag = 0x50
+    ...
+    du[51].inumber = 75497650
+    du[51].namelen = 15
+    du[51].name = "frame000049.tst"
+    du[51].tag = 0x650
+    du[52].freetag = 0xffff
+    du[52].length = 0x140
+    du[52].tag = 0x670
+    du[53].inumber = 75497661
+    du[53].namelen = 15
+    du[53].name = "frame000060.tst"
+    du[53].tag = 0x7b0
+    ...
+    du[118].inumber = 75497758
+    du[118].namelen = 15
+    du[118].name = "frame000125.tst"
+    du[118].tag = 0xfd0
+    du[119].freetag = 0xffff
+    du[119].length = 0x10
+    du[119].tag = 0xff0
+
+    **Note**
+
+    The xfs\_db field output is preceded by a "d" for "data".
+
+The next "data" block:
+
+::
+
+    xfs_db> dblock 1
+    xfs_db> type dir2
+    xfs_db> p
+    dhdr.magic = 0x58443244
+    dhdr.bestfree[0].offset = 0x6d0
+    dhdr.bestfree[0].length = 0x140
+    dhdr.bestfree[1].offset = 0xe50
+    dhdr.bestfree[1].length = 0x20
+    dhdr.bestfree[2].offset = 0xff0
+    dhdr.bestfree[2].length = 0x10
+    du[0].inumber = 75497759
+    du[0].namelen = 15
+    du[0].name = "frame000126.tst"
+    du[0].tag = 0x10
+    ...
+    du[53].inumber = 75497844
+    du[53].namelen = 15
+    du[53].name = "frame000179.tst"
+    du[53].tag = 0x6b0
+    du[54].freetag = 0xffff
+    du[54].length = 0x140
+    du[54].tag = 0x6d0
+    du[55].inumber = 75497855
+    du[55].namelen = 15
+    du[55].name = "frame000190.tst"
+    du[55].tag = 0x810
+    ...
+    du[104].inumber = 75497904
+    du[104].namelen = 15
+    du[104].name = "frame000239.tst"
+    du[104].tag = 0xe30
+    du[105].freetag = 0xffff
+    du[105].length = 0x20
+    du[105].tag = 0xe50
+    du[106].inumber = 75497906
+    du[106].namelen = 15
+    du[106].name = "frame000241.tst"
+    du[106].tag = 0xe70
+    ...
+    du[117].inumber = 75497917
+    du[117].namelen = 15
+    du[117].name = "frame000252.tst"
+    du[117].tag = 0xfd0
+    du[118].freetag = 0xffff
+    du[118].length = 0x10
+    du[118].tag = 0xff0
+
+And the last data block:
+
+::
+
+    xfs_db> dblock 2
+    xfs_db> type dir2
+    xfs_db> p
+    dhdr.magic = 0x58443244
+    dhdr.bestfree[0].offset = 0x70
+    dhdr.bestfree[0].length = 0xf90
+    dhdr.bestfree[1].offset = 0
+    dhdr.bestfree[1].length = 0
+    dhdr.bestfree[2].offset = 0
+    dhdr.bestfree[2].length = 0
+    du[0].inumber = 75497918
+    du[0].namelen = 15
+    du[0].name = "frame000253.tst"
+    du[0].tag = 0x10
+    du[1].inumber = 75497919
+    du[1].namelen = 15
+    du[1].name = "frame000254.tst"
+    du[1].tag = 0x30
+    du[2].inumber = 75497920
+    du[2].namelen = 15
+    du[2].name = "frame000255.tst"
+    du[2].tag = 0x50
+    du[3].freetag = 0xffff
+    du[3].length = 0xf90
+    du[3].tag = 0x70
+
+Examining the "leaf" block (with the fields preceded by an "l" for
+"leaf"):
+
+::
+
+    xfs_db> dblock 8388608
+    xfs_db> type dir2
+    xfs_db> p
+    lhdr.info.forw = 0
+    lhdr.info.back = 0
+    lhdr.info.magic = 0xd2f1
+    lhdr.count = 258
+    lhdr.stale = 0
+    lbests[0-2] = 0:0x10 1:0x10 2:0xf90
+    lents[0].hashval = 0x2e
+    lents[0].address = 0x2
+    lents[1].hashval = 0x172e
+    lents[1].address = 0x4
+    lents[2].hashval = 0x23a04084
+    lents[2].address = 0x116
+    ...
+    lents[257].hashval = 0xf3a048bc
+    lents[257].address = 0x366
+    ltail.bestcount = 3
+
+Note how the lbests array correspond with the bestfree[0].length values in the
+"data" blocks:
+
+::
+
+    xfs_db> dblock 0
+    xfs_db> type dir2
+    xfs_db> p
+    dhdr.magic = 0x58443244
+    dhdr.bestfree[0].offset = 0xff0
+    dhdr.bestfree[0].length = 0x10
+    ...
+    xfs_db> dblock 1
+    xfs_db> type dir2
+    xfs_db> p
+    dhdr.magic = 0x58443244
+    dhdr.bestfree[0].offset = 0xff0
+    dhdr.bestfree[0].length = 0x10
+    ...
+    xfs_db> dblock 2
+    xfs_db> type dir2
+    xfs_db> p
+    dhdr.magic = 0x58443244
+    dhdr.bestfree[0].offset = 0x70
+    dhdr.bestfree[0].length = 0xf90
+
+Now after the entries have been deleted:
+
+::
+
+    xfs_db> dblock 8388608
+    xfs_db> type dir2
+    xfs_db> p
+    lhdr.info.forw = 0
+    lhdr.info.back = 0
+    lhdr.info.magic = 0xd2f1
+    lhdr.count = 258
+    lhdr.stale = 21
+    lbests[0-2] = 0:0x140 1:0x140 2:0xf90
+    lents[0].hashval = 0x2e
+    lents[0].address = 0x2
+    lents[1].hashval = 0x172e
+    lents[1].address = 0x4
+    lents[2].hashval = 0x23a04084
+    lents[2].address = 0x116
+    ...
+
+As can be seen, the lbests values have been update to contain each
+hdr.bestfree[0].length values. The leaf’s hdr.stale value has also been
+updated to specify the number of stale entries in the array. The stale entries
+have an address of zero.
+
+TODO: Need an example for where new entries get inserted with several large
+free spaces.
+
+Node Directories
+~~~~~~~~~~~~~~~~
+
+When the "leaf" information fills a block, the extents undergo another
+separation. All "freeindex" information moves into its own extent. Like
+Leaf Directories, the
+"leaf" block maintained the best free space information for
+each "data" block. This is not possible with more than one leaf.
+
+-  The "data" blocks stay the same as leaf directories.
+
+-  After the "freeindex" data moves to its own block, it is possible for
+   the leaf data to fit within a single leaf block. This single leaf block has
+   a magic number of XFS\_DIR2\_LEAFN\_MAGIC (0xd2ff) or on a v5 filesystem,
+   XFS\_DIR3\_LEAFN\_MAGIC (0x3dff).
+
+-  The "leaf" blocks eventually change into a B+tree with the generic B+tree
+   header pointing to directory "leaves" as described in `Leaf
+   Directories <#leaf-directories>`__. Blocks with leaf data still have the
+   LEAFN\_MAGIC magic number as outlined above. The top-level tree blocks are
+   called "nodes" and have a magic number of XFS\_DA\_NODE\_MAGIC
+   (0xfebe), or on a v5 filesystem, XFS\_DA3\_NODE\_MAGIC (0x3ebe).
+
+-  Distinguishing between a combined leaf/freeindex block (LEAF1\_MAGIC), a
+   leaf-only block (LEAFN\_MAGIC), and a btree node block (NODE\_MAGIC) can
+   only be done by examining the magic number.
+
+-  The new "freeindex" block(s) only contains the bests for each data
+   block.
+
+-  The freeindex block uses the following structures:
+
+.. code:: c
+
+    typedef struct xfs_dir2_free_hdr {
+         __uint32_t                magic;
+         __int32_t                 firstdb;
+         __int32_t                 nvalid;
+         __int32_t                 nused;
+    } xfs_dir2_free_hdr_t;
+
+**magic**
+    The magic number of the free block, "XD2F" (0x0x58443246).
+
+**firstdb**
+    The starting directory block number for the bests array.
+
+**nvalid**
+    Number of valid elements in the bests array. This number must correspond
+    with the number of directory blocks can fit under the inode di\_size.
+
+**nused**
+    Number of used elements in the bests array. This number must correspond
+    with the number of directory blocks actually mapped under the inode
+    di\_size.
+
+.. code:: c
+
+    typedef struct xfs_dir2_free {
+         xfs_dir2_free_hdr_t       hdr;
+         xfs_dir2_data_off_t       bests[1];
+    } xfs_dir2_free_t;
+
+**hdr**
+    Free block header.
+
+**bests**
+    An array specifying the best free counts in each directory data block.
+
+-  On a v5 filesystem, the freeindex block uses the following structures:
+
+.. code:: c
+
+    struct xfs_dir3_free_hdr {
+         struct xfs_dir3_blk_hdr   hdr;
+         __int32_t                 firstdb;
+         __int32_t                 nvalid;
+         __int32_t                 nused;
+         __int32_t                 pad;
+    };
+
+**hdr**
+    v3 directory block header. The magic number is "XDF3" (0x0x58444633).
+
+**firstdb**
+    The starting directory block number for the bests array.
+
+**nvalid**
+    Number of valid elements in the bests array. This number must correspond
+    with the number of directory blocks can fit under the inode di\_size.
+
+**nused**
+    Number of used elements in the bests array. This number must correspond
+    with the number of directory blocks actually mapped under the inode
+    di\_size.
+
+**pad**
+    Padding to maintain alignment.
+
+.. code:: c
+
+    struct xfs_dir3_free {
+         xfs_dir3_free_hdr_t       hdr;
+         __be16                    bests[1];
+    };
+
+**hdr**
+    Free block header.
+
+**bests**
+    An array specifying the best free counts in each directory data block.
+
+-  The location of the leaf blocks can be in any order, the only way to
+   determine the appropriate is by the node block hash/before values. Given a
+   hash to look up, you read the node’s btree array and first hashval in the
+   array that exceeds the given hash and it can then be found in the block
+   pointed to by the before value.
+
+-  The freeindex’s bests array starts from the end of the block and grows to
+   the start of the block.
+
+-  When an data block becomes unused (ie. all entries in it have been
+   deleted), the block is freed, the data extents contain a hole, and the
+   freeindex’s hdr.nused value is decremented and the associated bests[] entry
+   is set to 0xffff.
+
+-  As the first data block always contains "." and "..", it’s invalid for
+   the directory to have a hole at the start.
+
+-  The freeindex’s hdr.nused should always be the same as the number of
+   allocated data directory blocks containing name/inode data and will always
+   be less than or equal to hdr.nvalid. The value of hdr.nvalid should be the
+   same as the index of the last data directory block plus one (i.e. when the
+   last data block is freed, nused and nvalid are decremented).
+
+.. ifconfig:: builder != 'latex'
+
+   .. figure:: images/54.png
+      :alt: Node directory layout
+
+      Node directory layout
+
+.. ifconfig:: builder == 'latex'
+
+   .. figure:: images/54.png
+      :scale: 40%
+      :alt: Node directory layout
+
+      Node directory layout
+
+xfs\_db Node Directory Example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+With the node directory examples, we are using a filesystems with 4KB block
+size, and a 16KB directory size. The directory has over 2000 entries:
+
+::
+
+    xfs_db> sb 0
+    xfs_db> p
+    magicnum = 0x58465342
+    blocksize = 4096
+    ...
+    dirblklog = 2
+    ...
+    xfs_db> inode <inode#>
+    xfs_db> p
+    core.magic = 0x494e
+    core.mode = 040755
+    core.version = 1
+    core.format = 2 (extents)
+    ...
+    core.size = 81920
+    core.nblocks = 36
+    core.extsize = 0
+    core.nextents = 8
+    ...
+    u.bmx[0-7] = [startoff,startblock,blockcount,extentflag] 0:[0,7368,4,0]
+    1:[4,7408,4,0] 2:[8,7444,4,0] 3:[12,7480,4,0] 4:[16,7520,4,0]
+    5:[8388608,7396,4,0] 6:[8388612,7524,8,0] 7:[16777216,7516,4,0]
+
+As can already be observed, all extents are allocated is multiples of 4
+blocks.
+
+Blocks 0 to 19 (16+4-1) are used for directory data blocks. Looking at blocks
+16-19, we can seen that it’s the same as the single-leaf format, except the
+length values are a lot larger to accommodate the increased directory block
+size:
+
+::
+
+    xfs_db> dblock 16
+    xfs_db> type dir2
+    xfs_db> p
+    dhdr.magic = 0x58443244
+    dhdr.bestfree[0].offset = 0xb0
+    dhdr.bestfree[0].length = 0x3f50
+    dhdr.bestfree[1].offset = 0
+    dhdr.bestfree[1].length = 0
+    dhdr.bestfree[2].offset = 0
+    dhdr.bestfree[2].length = 0
+    du[0].inumber = 120224
+    du[0].namelen = 15
+    du[0].name = "frame002043.tst"
+    du[0].tag = 0x10
+    du[1].inumber = 120225
+    du[1].namelen = 15
+    du[1].name = "frame002044.tst"
+    du[1].tag = 0x30
+    du[2].inumber = 120226
+    du[2].namelen = 15
+    du[2].name = "frame002045.tst"
+    du[2].tag = 0x50
+    du[3].inumber = 120227
+    du[3].namelen = 15
+    du[3].name = "frame002046.tst"
+    du[3].tag = 0x70
+    du[4].inumber = 120228
+    du[4].namelen = 15
+    du[4].name = "frame002047.tst"
+    du[4].tag = 0x90
+    du[5].freetag = 0xffff
+    du[5].length = 0x3f50
+    du[5].tag = 0
+
+Next, the "node" block, the fields are preceded with 'n' for node blocks:
+
+::
+
+    xfs_db> dblock 8388608
+    xfs_db> type dir2
+    xfs_db> p
+    nhdr.info.forw = 0
+    nhdr.info.back = 0
+    nhdr.info.magic = 0xfebe
+    nhdr.count = 2
+    nhdr.level = 1
+    nbtree[0-1] = [hashval,before] 0:[0xa3a440ac,8388616] 1:[0xf3a440bc,8388612]
+
+The two following leaf blocks were allocated as part of the directory’s
+conversion to node format. All hashes less than 0xa3a440ac are located at
+directory offset 8,388,616, and hashes less than 0xf3a440bc are located at
+directory offset 8,388,612. Hashes greater or equal to 0xf3a440bc don’t exist
+in this directory.
+
+::
+
+    xfs_db> dblock 8388616
+    xfs_db> type dir2
+    xfs_db> p
+    lhdr.info.forw = 8388612
+    lhdr.info.back = 0
+    lhdr.info.magic = 0xd2ff
+    lhdr.count = 1023
+    lhdr.stale = 0
+    lents[0].hashval = 0x2e
+    lents[0].address = 0x2
+    lents[1].hashval = 0x172e
+    lents[1].address = 0x4
+    lents[2].hashval = 0x23a04084
+    lents[2].address = 0x116
+    ...
+    lents[1021].hashval = 0xa3a440a4
+    lents[1021].address = 0x1fa2
+    lents[1022].hashval = 0xa3a440ac
+    lents[1022].address = 0x1fca
+    xfs_db> dblock 8388612
+    xfs_db> type dir2
+    xfs_db> p
+    lhdr.info.forw = 0
+    lhdr.info.back = 8388616
+    lhdr.info.magic = 0xd2ff
+    lhdr.count = 1027
+    lhdr.stale = 0
+    lents[0].hashval = 0xa3a440b4
+    lents[0].address = 0x1f52
+    lents[1].hashval = 0xa3a440bc
+    lents[1].address = 0x1f7a
+    ...
+    lents[1025].hashval = 0xf3a440b4
+    lents[1025].address = 0x1f66
+    lents[1026].hashval = 0xf3a440bc
+    lents[1026].address = 0x1f8e
+
+An example lookup using xfs\_db:
+
+::
+
+    xfs_db> hash frame001845.tst
+    0xf3a26094
+
+Doing a binary search through the array, we get address 0x1ce6, which is
+offset 0xe730. Each fsblock is 4KB in size (0x1000), so it will be offset
+0x730 into directory offset 14. From the extent map, this will be fsblock
+7482:
+
+::
+
+    xfs_db> fsblock 7482
+    xfs_db> type text
+    xfs_db> p
+    ...
+    730: 00 00 00 00 00 01 d4 da 0f 66 72 61 6d 65 30 30 .........frame00
+    740: 31 38 34 35 2e 74 73 74 00 00 00 00 00 00 27 30 1845.tst.......0
+
+Looking at the freeindex information (fields with an 'f' tag):
+
+::
+
+    xfs_db> fsblock 7516
+    xfs_db> type dir2
+    xfs_db> p
+    fhdr.magic = 0x58443246
+    fhdr.firstdb = 0
+    fhdr.nvalid = 5
+    fhdr.nused = 5
+    fbests[0-4] = 0:0x10 1:0x10 2:0x10 3:0x10 4:0x3f50
+
+Like the Leaf Directory, each of the fbests values correspond to each data
+block’s bestfree[0].length value.
+
+The fbests array is highlighted in a raw block dump:
+
+::
+
+    xfs_db> type text
+    xfs_db> p
+    000: 58 44 32 46 00 00 00 00 00 00 00 05 00 00 00 05 XD2F............
+    010: 00 10 00 10 00 10 00 10 3f 50 00 00 1f 01 ff ff .........P......
+
+TODO: Example with a hole in the middle
+
+B+tree Directories
+~~~~~~~~~~~~~~~~~~
+
+When the extent map in an inode grows beyond the inode’s space, the inode
+format is changed to a
+"btree". The inode contains a filesystem block point to the
+B+tree extent map for the directory’s blocks. The B+tree extents contain the
+extent map for the "data", "node", "leaf", and "freeindex"
+information as described in Node Directories.
+
+Refer to the previous section on B+tree `Data Extents <#b-tree-extent-list>`__
+for more information on XFS B+tree extents.
+
+The following properties apply to both node and B+tree directories:
+
+-  The node/leaf trees can be more than one level deep.
+
+-  More than one freeindex block may exist, but this will be quite rare. It
+   would required hundreds of thousand files with quite long file names (or
+   millions with shorter names) to get a second freeindex block.
+
+xfs\_db B+tree Directory Example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A directory has been created with 200,000 entries with each entry being 100
+characters long. The filesystem block size and directory block size are 4KB:
+
+::
+
+    xfs_db> inode <inode#>
+    xfs_db> p
+    core.magic = 0x494e
+    core.mode = 040755
+    core.version = 1
+    core.format = 3 (btree)
+    ...
+    core.size = 22757376
+    core.nblocks = 6145
+    core.extsize = 0
+    core.nextents = 234
+    core.naextents = 0
+    core.forkoff = 0
+    ...
+    u.bmbt.level = 1
+    u.bmbt.numrecs = 1
+    u.bmbt.keys[1] = [startoff] 1:[0]
+    u.bmbt.ptrs[1] = 1:89
+    xfs_db> fsblock 89
+    xfs_db> type bmapbtd
+    xfs_db> p
+    magic = 0x424d4150
+    level = 0
+    numrecs = 234
+    leftsib = null
+    rightsib = null
+    recs[1-234] = [startoff,startblock,blockcount,extentflag]
+       1:[0,53,1,0] 2:[1,55,13,0] 3:[14,69,1,0] 4:[15,72,13,0]
+       5:[28,86,2,0] 6:[30,90,21,0] 7:[51,112,1,0] 8:[52,114,11,0]
+       ...
+       125:[5177,902,15,0] 126:[5192,918,6,0] 127:[5198,524786,358,0]
+       128:[8388608,54,1,0] 129:[8388609,70,2,0] 130:[8388611,85,1,0]
+       ...
+       229:[8389164,917,1,0] 230:[8389165,924,19,0] 231:[8389184,944,9,0]
+       232:[16777216,68,1,0] 233:[16777217,7340114,1,0] 234:[16777218,5767362,1,0]
+
+We have 128 extents and a total of 5555 blocks being used to store name/inode
+pairs. With only about 2000 values that can be stored in the freeindex block,
+3 blocks have been allocated for this information. The firstdb field specifies
+the starting directory block number for each array:
+
+::
+
+    xfs_db> dblock 16777216
+    xfs_db> type dir2
+    xfs_db> p
+    fhdr.magic = 0x58443246
+    fhdr.firstdb = 0
+    fhdr.nvalid = 2040
+    fhdr.nused = 2040
+    fbests[0-2039] = ...
+    xfs_db> dblock 16777217
+    xfs_db> type dir2
+    xfs_db> p
+    fhdr.magic = 0x58443246
+    fhdr.firstdb = 2040
+    fhdr.nvalid = 2040
+    fhdr.nused = 2040
+    fbests[0-2039] = ...
+    xfs_db> dblock 16777218
+    xfs_db> type dir2
+    xfs_db> p
+    fhdr.magic = 0x58443246
+    fhdr.firstdb = 4080
+    fhdr.nvalid = 1476
+    fhdr.nused = 1476
+    fbests[0-1475] = ...
+
+Looking at the root node in the node block, it’s a pretty deep tree:
+
+::
+
+    xfs_db> dblock 8388608
+    xfs_db> type dir2
+    xfs_db> p
+    nhdr.info.forw = 0
+    nhdr.info.back = 0
+    nhdr.info.magic = 0xfebe
+    nhdr.count = 2
+    nhdr.level = 2
+    nbtree[0-1] = [hashval,before] 0:[0x6bbf6f39,8389121] 1:[0xfbbf7f79,8389120]
+    xfs_db> dblock 8389121
+    xfs_db> type dir2
+    xfs_db> p
+    nhdr.info.forw = 8389120
+    nhdr.info.back = 0
+    nhdr.info.magic = 0xfebe
+    nhdr.count = 263
+    nhdr.level = 1
+    nbtree[0-262] = ... 262:[0x6bbf6f39,8388928]
+    xfs_db> dblock 8389120
+    xfs_db> type dir2
+    xfs_db> p
+    nhdr.info.forw = 0
+    nhdr.info.back = 8389121
+    nhdr.info.magic = 0xfebe
+    nhdr.count = 319
+    nhdr.level = 1
+    nbtree[0-318] = [hashval,before] 0:[0x70b14711,8388919] ...
+
+The leaves at each the end of a node always point to the end leaves in
+adjacent nodes. Directory block 8388928 has a forward pointer to block 8388919
+and block 8388919 has a previous pointer to block 8388928, as highlighted in
+the following example:
+
+::
+
+    xfs_db> dblock 8388928
+    xfs_db> type dir2
+    xfs_db> p
+    lhdr.info.forw = 8388919
+    lhdr.info.back = 8388937
+    lhdr.info.magic = 0xd2ff
+    ...
+
+    xfs_db> dblock 8388919
+    xfs_db> type dir2
+    xfs_db> p
+    lhdr.info.forw = 8388706
+    lhdr.info.back = 8388928
+    lhdr.info.magic = 0xd2ff
+    ...
diff --git a/Documentation/filesystems/xfs/ondisk/dynamic.rst b/Documentation/filesystems/xfs/ondisk/dynamic.rst
index a8c65ed7e6ee..3305c5218250 100644
--- a/Documentation/filesystems/xfs/ondisk/dynamic.rst
+++ b/Documentation/filesystems/xfs/ondisk/dynamic.rst
@@ -5,3 +5,4 @@ Dynamic Allocated Structures
 
 .. include:: ondisk_inode.rst
 .. include:: data_extents.rst
+.. include:: directories.rst





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux