[PATCH 6/6] xfsdocs: document the extended rmap btree

"Darrick J. Wong" <djwong@xxxxxxxxxxxxxxxx> · Fri, 04 Mar 2016 16:35:45 -0800

The reverse mapping btree now comes in two flavors: a fat one for
reflink filesystems supporting overlapped interval queries and a thin
one for filesystems that don't share blocks.  Document the new on-disk
formats.

Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
---
 design/XFS_Filesystem_Structure/docinfo.xml     |   16 +++
 design/XFS_Filesystem_Structure/magic.asciidoc  |    1 
 design/XFS_Filesystem_Structure/rmapbt.asciidoc |  108 +++++++++++++++++++++--
 3 files changed, 116 insertions(+), 9 deletions(-)

diff --git a/design/XFS_Filesystem_Structure/docinfo.xml b/design/XFS_Filesystem_Structure/docinfo.xml
index 009376f..7d32260 100644
--- a/design/XFS_Filesystem_Structure/docinfo.xml
+++ b/design/XFS_Filesystem_Structure/docinfo.xml
@@ -138,4 +138,20 @@
 			</simplelist>
 		</revdescription>
 	</revision>
+	<revision>
+		<revnumber>3.1415</revnumber>
+		<date>March 2016</date>
+		<author>
+			<firstname>Darrick</firstname>
+			<surname>Wong</surname>
+			<email></email>
+		</author>
+		<revdescription>
+			<simplelist>
+				<member>Move the b+tree discussion to a separate chapter.</member>
+				<member>Discuss overlapping interval b+trees.</member>
+				<member>Document the reverse mapping btree changes when reflink is enabled.</member>
+			</simplelist>
+		</revdescription>
+	</revision>
 </revhistory>
diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc
index 7caf20e..5ce19a5 100644
--- a/design/XFS_Filesystem_Structure/magic.asciidoc
+++ b/design/XFS_Filesystem_Structure/magic.asciidoc
@@ -45,6 +45,7 @@ relevant chapters.  Magic numbers tend to have consistent locations:
 | +XFS_ATTR3_LEAF_MAGIC+	| 0x3bee	|     	| xref:Leaf_Attributes[Leaf Attribute], v5 only
 | +XFS_ATTR3_RMT_MAGIC+		| 0x5841524d	| XARM	| xref:Remote_Values[Remote Attribute Value], v5 only
 | +XFS_RMAP_CRC_MAGIC+		| 0x524d4233	| RMB3	| xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only
+| +XFS_RMAPX_CRC_MAGIC+		| 0x34524d42	| 4RMB	| xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only
 | +XFS_REFC_CRC_MAGIC+		| 0x52334643	| R3FC	| xref:Reference_Count_Btree[Reference Count B+tree], v5 only
 |=====
 
diff --git a/design/XFS_Filesystem_Structure/rmapbt.asciidoc b/design/XFS_Filesystem_Structure/rmapbt.asciidoc
index 2be28fa..bfdc74e 100644
--- a/design/XFS_Filesystem_Structure/rmapbt.asciidoc
+++ b/design/XFS_Filesystem_Structure/rmapbt.asciidoc
@@ -81,18 +81,40 @@ For the moment, there is a requirement that all records in the data or
 attribute forks must match exactly with the corresponding entry in the
 reverse-mapping B+tree.  This may be lifted in future versions of the patchset.
 
-For the reverse-mapping B+tree, the key definition is larger than the usual AG
-block number.  On a classic XFS filesystem, each block has only one owner, which
-means that +rm_startblock+ is sufficient to uniquely identify each record.
-However, shared block support (reflink) on XFS breaks that assumption; now
-filesystem blocks can be linked to any logical block offset of any file inode.
-Therefore, the key must include the owner and offset information to preserve the
-1 to 1 relation between key and record.  The key has the following structure:
+=== Reverse Mapping B+tree without Shared Blocks
+
+For the reverse-mapping B+tree on a filesystem that does not support sharing
+file data blocks, we can uniquely identify each record using only the per-AG
+block number.  The key has the following structure:
 
 [source, c]
 ----
 struct xfs_rmap_key {
      __be32                     rm_startblock;
+};
+----
+
+* As the reference counting is AG relative, all the block numbers are only
+32-bits.
+* The +bb_magic+ value is "RMB3" (0x524d4233).
+* The +xfs_btree_sblock_t+ header is used for intermediate B+tree node as well
+as the leaves.
+
+=== Reverse Mapping B+tree with Shared Blocks
+
+For the reverse-mapping B+tree on a filesystem that supports sharing of file
+data blocks, the key definition is larger than the usual AG block number.  On a
+classic XFS filesystem, each block has only one owner, which means that
++rm_startblock+ is sufficient to uniquely identify each record.  However,
+shared block support (reflink) on XFS breaks that assumption; now filesystem
+blocks can be linked to any logical block offset of any file inode.  Therefore,
+the key must include the owner and offset information to preserve the 1 to 1
+relation between key and record.  The key has the following structure:
+
+[source, c]
+----
+struct xfs_rmapx_key {
+     __be32                     rm_startblock;
      __be64                     rm_owner;
      __be64                     rm_fork:1;
      __be64                     rm_bmbt:1;
@@ -102,9 +124,17 @@ struct xfs_rmap_key {
 
 * As the reference counting is AG relative, all the block numbers are only
 32-bits.
-* The +bb_magic+ value is "RMB3" (0x524d4233).
+* The +bb_magic+ value is "4RMB" (0x34524d42).
 * The +xfs_btree_sblock_t+ header is used for intermediate B+tree node as well
 as the leaves.
+* Each pointer is associated with two keys.  The first of these is the "low
+key", which is the key of the smallest record accessible through the pointer.
+This low key has the same meaning as the key in all other btrees.  The second
+key is the high key, which is the maximum of the largest key that can be used
+to access a given record underneath the pointer.  Recall that each record
+in the reverse mapping b+tree describes an interval of physical blocks mapped
+to an interval of logical file block offsets; therefore, it makes sense that
+a range of keys can be used to find to a record.
 
 === xfs_db rmapbt Example
 
@@ -112,7 +142,7 @@ This example shows a reverse-mapping B+tree from a freshly formatted root
 filesystem:
 
 ----
-xfs_db> agi 0
+xfs_db> agf 0
 xfs_db> addr rmaproot
 xfs_db> p
 magic = 0x524d4233
@@ -222,3 +252,63 @@ magic = 0x524d4233
 
 As you can see, the reverse block-mapping B+tree is an important secondary
 metadata structure, which can be used to reconstruct damaged primary metadata.
+Now let's look at an extend rmap btree:
+
+----
+xfs_db> agf 0
+xfs_db> addr rmaproot
+xfs_db> p
+magic = 0x34524d42
+level = 1
+numrecs = 5
+leftsib = null
+rightsib = null
+bno = 6368
+lsn = 0x100000d1b
+uuid = 400f0928-6b88-4c37-af1e-cef1f8911f3f
+owner = 0
+crc = 0x8d4ace05 (correct)
+keys[1-5] = [startblock,owner,offset,attrfork,bmbtblock,startblock_hi,owner_hi,offset_hi,attrfork_hi,bmbtblock_hi]
+1:[0,-3,0,0,0,705,132,681,0,0]
+2:[24,5761,0,0,0,548,5761,524,0,0]
+3:[24,5929,0,0,0,380,5929,356,0,0]
+4:[24,6097,0,0,0,212,6097,188,0,0]
+5:[24,6277,0,0,0,807,-7,0,0,0]
+ptrs[1-5] = 1:5 2:771 3:9 4:10 5:11
+----
+
+The second pointer stores both the low key [24,5761,0,0,0] and the high key
+[548,5761,524,0,0], which means that we can expect block 771 to contain records
+starting at physical block 24, inode 5761, offset zero; and that one of the
+records can be used to find a reverse mapping for physical block 548, inode
+5761, and offset 524:
+
+----
+xfs_db> addr ptrs[2]
+xfs_db> p
+magic = 0x34524d42
+level = 0
+numrecs = 168
+leftsib = 5
+rightsib = 9
+bno = 6168
+lsn = 0x100000d1b
+uuid = 400f0928-6b88-4c37-af1e-cef1f8911f3f
+owner = 0
+crc = 0xd58eff0e (correct)
+recs[1-168] = [startblock,blockcount,owner,offset,extentflag,attrfork,bmbtblock]
+1:[24,525,5761,0,0,0,0]
+2:[24,524,5762,0,0,0,0]
+3:[24,523,5763,0,0,0,0]
+...
+166:[24,360,5926,0,0,0,0]
+167:[24,359,5927,0,0,0,0]
+168:[24,358,5928,0,0,0,0]
+----
+
+Observe that the first record in the block starts at physical block 24, inode
+5761, offset zero, just as we expected.  Note that this first record is also
+indexed by the highest key as provided in the node block; physical block 548,
+inode 5761, offset 524 is the very last block mapped by this record.  Furthermore,
+note that record 168, despite being the last record in this block, has a lower
+maximum key (physical block 382, inode 5928, offset 23) than the first record.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs