On Thu, Apr 09, 2020 at 10:16:08AM +1000, Dave Chinner wrote: > On Wed, Apr 08, 2020 at 04:27:53PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > Dave and I had a short discussion about whether or not xattr trees > > needed to have the same free space tracking that directories have, and > > a comparison of how each of the two metadata types interact with > > dabtrees resulted. I've reworked this a bit to make it flow better as a > > book chapter, so here we go. > > > > Original-mail: https://lore.kernel.org/linux-xfs/20200404085203.1908-1-chandanrlinux@xxxxxxxxx/T/#mdd12ad06cf5d635772cc38946fc5b22e349e136f > > Originally-from: Dave Chinner <david@xxxxxxxxxxxxx> > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > Couple of things. > > We are talking about btrees and where the record data is being > stored (internal or external). Hence I think it makes sense to refer > to "attribute records" and "directory records" (or "dirent records") > rather than "attributes" and "directory entries"... Ok, I'll clean that up > "leaves" -> "leaf nodes" Fixed. > > --- > > .../extended_attributes.asciidoc | 49 ++++++++++++++++++++ > > 1 file changed, 49 insertions(+) > > > > diff --git a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc > > index 99f7b35..d61c649 100644 > > --- a/design/XFS_Filesystem_Structure/extended_attributes.asciidoc > > +++ b/design/XFS_Filesystem_Structure/extended_attributes.asciidoc > > @@ -910,3 +910,52 @@ Log sequence number of the last write to this block. > > > > Filesystems formatted prior to v5 do not have this header in the remote block. > > Value data begins immediately at offset zero. > > + > > +== Key Differences Between Directories and Extended Attributes > > + > > +Though directories and extended attributes can take advantage of the same > > +variable length record btree structures (i.e. the dabtree) to map name hashes > > +to disk blocks, there are major differences in the ways that each of those > > +users embed the btree within the information that they are storing. > > + > > +Directory blocks require external free space tracking because the directory > > +blocks are not part of the dabtree itself. The dabtree leaves for a directory > > +map name hashes to external directory data blocks. Extended attributes, on > > "The dabtree leaves for ...." implies it is going somewhere, not > that you are talking about leaf nodes. :) Perhaps: > > "The directory dabtree leaf nodes contain a mapping between name > hash and the location of the dirent record in the external directory > data blocks." <nod> > > +the other hand, store all of the attributes in the leaves of the dabtree. > > "... store the attribute records directly in the dabtree leaf > nodes." <nod> > > + > > +When we add or remove an extended attribute in the dabtree, we split or merge > > +leaves of the tree based on where the name hash index tells us a leaf needs to > > +be inserted into or removed. In other words, we make space available or > > +collapse sparse leaves of the dabtree as a side effect of inserting or > > +removing attributes. > > + > > +The directory structure is very different. Directory entries cannot change > > +location because each entry's logical offset into the directory data segment > > +is used as the readdir/seekdir/telldir cookie, and the cookie is required to > > +be stable for the life of the entry. Therefore, we cannot store directory > > +entries in the leaves of a dabtree (which is indexed in hash order) because > > The userspace readdir/seekdir/telldir directory cookie API places a > requirement on the directory structure that dirent record cookie > cannot change for the life of the dirent record. We use the dirent > record's logical offset into the directory data segment for that > cookie, and hence the dirent record cannot change location. > Therefore, we cannot store directory records in the leaf nodes of > the dabtree.... Ok, I'll massage that in. :) > > +the offset into the tree would change as other entries are inserted and > > +removed. Hence when we remove directory entries, we must leave holes in the > > +data segment so the rest of the entries do not move. > > + > > +The directory name hash index (the dabtree bit) is held in the second > > +directory segment. Because the dabtree only stores pointers to directory > > +entries in the (first) data segment, there is no need to leave holes in the > > +dabtree itself. The dabtree merges or splits leaves as required as pointers > > +to the directory data segment are added or removed. The dabtree itself needs > > +no free space tracking. > > + > > +When we go to add a directory entry, we need to find the best-fitting free > > s/go to// Fixed. > > +space in the directory data segment to turn into the new entry. This requires > > +a free space index for the directory data segment. The free space index is > > +held in the third directory segment. Once we've used the free space index to > > +find the block with that best free space, we modify the directory data block > > +and update the dabtree to point the name hash at the new entry. > > + > > +In other words, the requirement for a free space map in the directory > > +structure results from storing the directory entry data externally to the > > +dabtree. Extended atttributes are stored directly in the leaves of the > > dabtree leaf nodes Fixed. > > +dabtree (except for remote attributes which can be anywhere in the attr fork > > +address space) and do not need external free space tracking to determine where > > +to best insert them. As a result, extended attributes exhibit nearly perfect > > +scaling until we run out of memory. > > Thanks for doing this, Darrick! NP. v2 is on its way. --D > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx