From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> Dave and I had a short discussion about whether or not xattr trees needed to have the same free space tracking that directories have, and a comparison of how each of the two metadata types interact with dabtrees resulted. I've reworked this a bit to make it flow better as a book chapter, so here we go. Original-mail: https://lore.kernel.org/linux-xfs/20200404085203.1908-1-chandanrlinux@xxxxxxxxx/T/#mdd12ad06cf5d635772cc38946fc5b22e349e136f Originally-from: Dave Chinner <david@xxxxxxxxxxxxx> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> --- v2: various fixes suggested by Dave; reflow the paragraphs about directories to describe the relations between dabtree and dirents only once; don't talk about an unnamed "we". --- .../extended_attributes.asciidoc | 55 ++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc index 99f7b35..b7a6007 100644 --- a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc +++ b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc @@ -910,3 +910,58 @@ Log sequence number of the last write to this block. Filesystems formatted prior to v5 do not have this header in the remote block. Value data begins immediately at offset zero. + +== Key Differences Between Directories and Extended Attributes + +Though directories and extended attributes can take advantage of the same +variable length record btree structures (i.e. the dabtree) to map name hashes +to directory entry records (dirent records) or extended attribute records, +there are major differences in the ways that each of those users embed the +btree within the information that they are storing. The directory dabtree leaf +nodes contain mappings between a name hash and the location of a dirent record +inside the directory entry segment. Extended attributes, on the other hand, +store attribute records directly in the leaf nodes of the dabtree. + +When XFS adds or removes an attribute record in any dabtree, it splits or +merges leaf nodes of the tree based on where the name hash index determines a +record needs to be inserted into or removed. In the attribute dabtree, XFS +splits or merges sparse leaf nodes of the dabtree as a side effect of inserting +or removing attribute records. + +Directories, however, are subject to stricter constraints. The userspace +readdir/seekdir/telldir directory cookie API places a requirement on the +directory structure that dirent record cookie cannot change for the life of the +dirent record. XFS uses the dirent record's logical offset into the directory +data segment as the cookie, and hence the dirent record cannot change location. +Therefore, XFS cannot store dirent records in the leaf nodes of the dabtree +because the offset into the tree would change as other entries are inserted and +removed. + +Dirent records are therefore stored within directory data blocks, all of which +are mapped in the first directory segment. The directory dabtree is mapped +into the second directory segment. Therefore, directory blocks require +external free space tracking because they are not part of the dabtree itself. +Because the dabtree only stores pointers to dirent records in the first data +segment, there is no need to leave holes in the dabtree itself. The dabtree +splits or merges leaf nodes as required as pointers to the directory data +segment are added or removed, and needs no free space tracking. + +When XFS adds a dirent record, it needs to find the best-fitting free space in +the directory data segment to turn into the new record. This requires a free +space index for the directory data segment. The free space index is held in +the third directory segment. Once XFS has used the free space index to find +the block with that best free space, it modifies the directory data block and +updates the dabtree to point the name hash at the new record. When XFS removes +dirent records, it leaves hole in the data segment so that the rest of the +entries do not move, and removes the corresponding dabtree name hash mapping. + +Note that for small directories, XFS collapses the name hash mappings and +the free space information into the directory data blocks to save space. + +In summary, the requirement for a free space map in the directory structure +results from storing the dirent records externally to the dabtree. Attribute +records are stored directly in the dabtree leaf nodes of the dabtree (except +for remote attribute values which can be anywhere in the attr fork address +space) and do not need external free space tracking to determine where to best +insert them. As a result, extended attributes exhibit nearly perfect scaling +until the computer runs out of memory.