From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> --- Documentation/filesystems/xfs/ondisk/overview.rst | 1 .../filesystems/xfs/ondisk/reconstruction.rst | 68 ++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 Documentation/filesystems/xfs/ondisk/reconstruction.rst diff --git a/Documentation/filesystems/xfs/ondisk/overview.rst b/Documentation/filesystems/xfs/ondisk/overview.rst index 484bf220c128..a2818102819d 100644 --- a/Documentation/filesystems/xfs/ondisk/overview.rst +++ b/Documentation/filesystems/xfs/ondisk/overview.rst @@ -46,3 +46,4 @@ latency. .. include:: self_describing_metadata.rst .. include:: delayed_logging.rst .. include:: reflink.rst +.. include:: reconstruction.rst diff --git a/Documentation/filesystems/xfs/ondisk/reconstruction.rst b/Documentation/filesystems/xfs/ondisk/reconstruction.rst new file mode 100644 index 000000000000..cb5baa494309 --- /dev/null +++ b/Documentation/filesystems/xfs/ondisk/reconstruction.rst @@ -0,0 +1,68 @@ +.. SPDX-License-Identifier: CC-BY-SA-3.0+ + +Metadata Reconstruction +----------------------- + + **Note** + + This is a theoretical discussion of how reconstruction could work; none of + this is implemented as of 2018. + +A simple UNIX filesystem can be thought of in terms of a directed acyclic +graph. To a first approximation, there exists a root directory node, which +points to other nodes. Those other nodes can themselves be directories or they +can be files. Each file, in turn, points to data blocks. + +XFS adds a few more details to this picture: + +- The real root(s) of an XFS filesystem are the allocation group headers + (superblock, AGF, AGI, AGFL). + +- Each allocation group’s headers point to various per-AG B+trees (free + space, inode, free inodes, free list, etc.) + +- The free space B+trees point to unused extents; + +- The inode B+trees point to blocks containing inode chunks; + +- All superblocks point to the root directory and the log; + +- Hardlinks mean that multiple directories can point to a single file node; + +- File data block pointers are indexed by file offset; + +- Files and directories can have a second collection of pointers to data + blocks which contain extended attributes; + +- Large directories require multiple data blocks to store all the + subpointers; + +- Still larger directories use high-offset data blocks to store a B+tree of + hashes to directory entries; + +- Large extended attribute forks similarly use high-offset data blocks to + store a B+tree of hashes to attribute keys; and + +- Symbolic links can point to data blocks. + +The beauty of this massive graph structure is that under normal circumstances, +everything known to the filesystem is discoverable (access controls +notwithstanding) from the root. The major weakness of this structure of course +is that breaking a edge in the graph can render entire subtrees inaccessible. +xfs\_repair “recovers” from broken directories by scanning for unlinked inodes +and connecting them to /lost+found, but this isn’t sufficiently general to +recover from breaks in other parts of the graph structure. Wouldn’t it be +useful to have back pointers as a secondary data structure? The current repair +strategy is to reconstruct whatever can be rebuilt, but to scrap anything that +doesn’t check out. + +The `reverse-mapping B+tree <#reverse-mapping-b-tree>`__ fills in part of the +puzzle. Since it contains copies of every entry in each inode’s data and +attribute forks, we can fix a corrupted block map with these records. +Furthermore, if the inode B+trees become corrupt, it is possible to visit all +inode chunks using the reverse-mapping data. Should XFS ever gain the ability +to store parent directory information in each inode, it also becomes possible +to resurrect damaged directory trees, which should reduce the complaints about +inodes ending up in /lost+found. Everything else in the per-AG primary +metadata can already be reconstructed via xfs\_repair. Hopefully, +reconstruction will not turn out to be a fool’s errand.