Re: [PATCH v24.3 12/14] xfs: document directory tree repairs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 03, 2023 at 11:50:57PM +0000, Allison Henderson wrote:
> On Wed, 2023-03-01 at 16:14 -0800, Darrick J. Wong wrote:
> > On Sat, Feb 25, 2023 at 07:33:23AM +0000, Allison Henderson wrote:
> > > On Thu, 2023-02-02 at 18:12 -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > > 
> > > > Directory tree repairs are the least complete part of online
> > > > fsck,
> > > > due
> > > > to the lack of directory parent pointers.  However, even without
> > > > that
> > > > feature, we can still make some corrections to the directory tree
> > > > --
> > > > we
> > > > can salvage as many directory entries as we can from a damaged
> > > > directory, and we can reattach orphaned inodes to the lost+found,
> > > > just
> > > > as xfs_repair does now.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > > ---
> > > > v24.2: updated with my latest thoughts about how to use parent
> > > > pointers
> > > > v24.3: updated to reflect the online fsck code I built for parent
> > > > pointers
> > > > ---
> > > >  .../filesystems/xfs-online-fsck-design.rst         |  410
> > > > ++++++++++++++++++++
> > > >  1 file changed, 410 insertions(+)
> > > > 
> > > > diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst
> > > > b/Documentation/filesystems/xfs-online-fsck-design.rst
> > > > index af7755fe0107..51d040e4a2d0 100644
> > > > --- a/Documentation/filesystems/xfs-online-fsck-design.rst
> > > > +++ b/Documentation/filesystems/xfs-online-fsck-design.rst
> > > > @@ -4359,3 +4359,413 @@ The proposed patchset is the
> > > >  `extended attribute repair
> > > >  <
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/
> > > > log/?h=repair-xattrs>`_
> > > >  series.
> > > > +
> > > > +Fixing Directories
> > > > +------------------
> > > > +
> > > > +Fixing directories is difficult with currently available
> > > > filesystem
> > > > features,
> > > > +since directory entries are not redundant.
> > > > +The offline repair tool scans all inodes to find files with
> > > > nonzero
> > > > link count,
> > > > +and then it scans all directories to establish parentage of
> > > > those
> > > > linked files.
> > > > +Damaged files and directories are zapped, and files with no
> > > > parent
> > > > are
> > > > +moved to the ``/lost+found`` directory.
> > > > +It does not try to salvage anything.
> > > > +
> > > > +The best that online repair can do at this time is to read
> > > > directory
> > > > data
> > > > +blocks and salvage any dirents that look plausible, correct link
> > > > counts, and
> > > > +move orphans back into the directory tree.
> > > > +The salvage process is discussed in the case study at the end of
> > > > this section.
> > > > +The :ref:`file link count fsck <nlinks>` code takes care of
> > > > fixing
> > > > link counts
> > > > +and moving orphans to the ``/lost+found`` directory.
> > > > +
> > > > +Case Study: Salvaging Directories
> > > > +`````````````````````````````````
> > > > +
> > > > +Unlike extended attributes, directory blocks are all the same
> > > > size,
> > > > so
> > > > +salvaging directories is straightforward:
> > > > +
> > > > +1. Find the parent of the directory.
> > > > +   If the dotdot entry is not unreadable, try to confirm that
> > > > the
> > > > alleged
> > > > +   parent has a child entry pointing back to the directory being
> > > > repaired.
> > > > +   Otherwise, walk the filesystem to find it.
> > > > +
> > > > +2. Walk the first partition of data fork of the directory to
> > > > find
> > > > the directory
> > > > +   entry data blocks.
> > > > +   When one is found,
> > > > +
> > > > +   a. Walk the directory data block to find candidate entries.
> > > > +      When an entry is found:
> > > > +
> > > > +      i. Check the name for problems, and ignore the name if
> > > > there
> > > > are.
> > > > +
> > > > +      ii. Retrieve the inumber and grab the inode.
> > > > +          If that succeeds, add the name, inode number, and file
> > > > type to the
> > > > +          staging xfarray and xblob.
> > > > +
> > > > +3. If the memory usage of the xfarray and xfblob exceed a
> > > > certain
> > > > amount of
> > > > +   memory or there are no more directory data blocks to examine,
> > > > unlock the
> > > > +   directory and add the staged dirents into the temporary
> > > > directory.
> > > > +   Truncate the staging files.
> > > > +
> > > > +4. Use atomic extent swapping to exchange the new and old
> > > > directory
> > > > structures.
> > > > +   The old directory blocks are now attached to the temporary
> > > > file.
> > > > +
> > > > +5. Reap the temporary file.
> > > > +
> > > 
> > > 
> > > 
> > > > +**Future Work Question**: Should repair revalidate the dentry
> > > > cache
> > > > when
> > > > +rebuilding a directory?
> > > > +
> > > > +*Answer*: Yes, though the current dentry cache code doesn't
> > > > provide
> > > > a means
> > > > +to walk every dentry of a specific directory.
> > > > +If the cache contains an entry that the salvaging code does not
> > > > find, the
> > > > +repair cannot proceed.
> > > > +
> > > > +**Future Work Question**: Can the dentry cache know about a
> > > > directory entry
> > > > +that cannot be salvaged?
> > > > +
> > > > +*Answer*: In theory, the dentry cache should be a subset of the
> > > > directory
> > > > +entries on disk because there's no way to load a dentry without
> > > > having
> > > > +something to read in the directory.
> > > > +However, it is possible for a coherency problem to be introduced
> > > > if
> > > > the ondisk
> > > > +structures becomes corrupt *after* the cache loads.
> > > > +In theory it is necessary to scan all dentry cache entries for a
> > > > directory to
> > > > +ensure that one of the following apply:
> > > 
> > > "Currently the dentry cache code doesn't provide a means to walk
> > > every
> > > dentry of a specific directory.  This makes validation of the
> > > rebuilt
> > > directory difficult, and it is possible that an ondisk structure to
> > > become corrupt *after* the cache loads.  Walking the dentry cache
> > > is
> > > currently being considered as a future improvement.  This will also
> > > enable the ability to report which entries were not salvageable
> > > since
> > > these will be the subset of entries that are absent after the walk.
> > > This improvement will ensure that one of the following apply:"
> > 
> > The thing is -- I'm not considering restructuring the dentry cache. 
> > The
> > cache key is a one-way hash function of the parent_ino and the dirent
> > name, and I can't even imagine how one would support using that for
> > arbitrary lookups or walks.
> > 
> > This is the giant hole in all of the online repair code -- the design
> > of
> > the dentry cache is such that we can't invalidate the entire cache. 
> > We
> > also cannot walk it to perform targeted invalidation of just the
> > pieces
> > we want.  If after a repair the cache contains a dentry that isn't
> > backed by an actual ondisk directory entry ... kaboom.
> > 
> > The one thing I'll grant you is that I don't think it's likely that
> > the
> > dentry cache will get populated with some information and later the
> > ondisk directory bitrots undetectably.
> > 
> > > ?
> > > 
> > > I just think it reads cleaner.  I realize this is an area that
> > > still
> > > sort of in flux, but definitely before we call the document done we
> > > should probably strip out the Q's and just document the A's.  If
> > > someone re-raises the Q's we can always refer to the archives and
> > > then
> > > have the discussion on the mailing list.  But I think the document
> > > should maintain the goal of making clear whatever the current plan
> > > is
> > > just to keep it reading cleanly. 
> > 
> > Yeah, I'll shorten this section so that it only mentions these things
> > once and clearly states that I have no solution.
> I see, yes I got the impression from the original phrasing that is was
> an intended "todo", so clarifying that its not should help. 

Ahh, ok. :)

> > 
> > > > +
> > > > +1. The cached dentry reflects an ondisk dirent in the new
> > > > directory.
> > > > +
> > > > +2. The cached dentry no longer has a corresponding ondisk dirent
> > > > in
> > > > the new
> > > > +   directory and the dentry can be purged from the cache.
> > > > +
> > > > +3. The cached dentry no longer has an ondisk dirent but the
> > > > dentry
> > > > cannot be
> > > > +   purged.
> > > 
> > > > +   This is bad.
> > > These entries are irrecoverable, but can now be reported.
> > > 
> > > 
> > > 
> > > > +
> > > > +As mentioned above, the dentry cache does not have a means to
> > > > walk
> > > > all the
> > > > +dentries with a particular directory as a parent.
> > > > +This makes detecting situations #2 and #3 impossible, and
> > > > remains an
> > > > +interesting question for research.
> > > I think the above paraphrase makes this last bit redundant.
> > 
> > N
> Not sure if this is "no" or an unfinished thought?

N[ot sure either.] :(

N[ot remembering what I was thinking here.]

N[ever mind].

<giggle>

> > 
> > > > +
> > > > +The proposed patchset is the
> > > > +`directory repair
> > > > +<
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/
> > > > log/?h=repair-dirs>`_
> > > > +series.
> > > > +
> > > > +Parent Pointers
> > > > +```````````````
> > > > +
> > > "Generally speaking, a parent pointer is any kind of metadata that
> > > enables an inode to locate its parent with out having to traverse
> > > the
> > > directory tree from the root."
> > > 
> > > > +The lack of secondary directory metadata hinders directory tree
> > > "Without them, the lack of secondary..." 
> > 
> > Ok.  I want to reword the first sentence slightly, yielding this:
> > 
> > "A parent pointer is a piece of file metadata that enables a user to
> > locate the file's parent directory without having to traverse the
> > directory tree from the root.  Without them, reconstruction of
> > directory
> > trees is hindered in much the same way that the historic lack of
> > reverse
> > space mapping information once hindered reconstruction of filesystem
> > space metadata.  The parent pointer feature, however, makes total
> > directory reconstruction
> > possible."
> 
> Alrighty, that sounds good
> 
> > 
> > But that's a much better start to the paragraph, thank you.
> > 
> > > > reconstruction
> > > > +in much the same way that the historic lack of reverse space
> > > > mapping
> > > > +information once hindered reconstruction of filesystem space
> > > > metadata.
> > > > +The parent pointer feature, however, makes total directory
> > > > reconstruction
> > > > +possible.
> > > > +
> > > 
> > > History side bar the below chunk...
> > 
> > Done.
> > 
> > > > +Directory parent pointers were first proposed as an XFS feature
> > > > more
> > > > than a
> > > > +decade ago by SGI.
> > > > +Each link from a parent directory to a child file is mirrored
> > > > with
> > > > an extended
> > > > +attribute in the child that could be used to identify the parent
> > > > directory.
> > > > +Unfortunately, this early implementation had major shortcomings
> > > > and
> > > > was never
> > > > +merged into Linux XFS:
> > > > +
> > > > +1. The XFS codebase of the late 2000s did not have the
> > > > infrastructure to
> > > > +   enforce strong referential integrity in the directory tree.
> > > > +   It did not guarantee that a change in a forward link would
> > > > always
> > > > be
> > > > +   followed up with the corresponding change to the reverse
> > > > links.
> > > > +
> > > > +2. Referential integrity was not integrated into offline repair.
> > > > +   Checking and repairs were performed on mounted filesystems
> > > > without taking
> > > > +   any kernel or inode locks to coordinate access.
> > > > +   It is not clear how this actually worked properly.
> > > > +
> > > > +3. The extended attribute did not record the name of the
> > > > directory
> > > > entry in the
> > > > +   parent, so the SGI parent pointer implementation cannot be
> > > > used
> > > > to reconnect
> > > > +   the directory tree.
> > > > +
> > > > +4. Extended attribute forks only support 65,536 extents, which
> > > > means
> > > > that
> > > > +   parent pointer attribute creation is likely to fail at some
> > > > point
> > > > before the
> > > > +   maximum file link count is achieved.
> > > 
> > > 
> > > "The original parent pointer design was too unstable for something
> > > like
> > > a file system repair to depend on."
> > 
> > Er... I think this is addressed by #2 above?
> Sorry, I meant for the history side bar to go through the list, and
> then add that quotation to connect the paragraphs.  In a way, simply
> talking about the new improvements below implies everything that the
> old design lacked.

*OH* ok, I think I understand now.  You're suggesting this sentence as
an introduction to the paragraph below, not as something to be appended
to point #4.  That makes more sense, I'll go add that, thanks!

> > 
> > > > +
> > > > +Allison Henderson, Chandan Babu, and Catherine Hoang are working
> > > > on
> > > > a second
> > > > +implementation that solves all shortcomings of the first.
> > > > +During 2022, Allison introduced log intent items to track
> > > > physical
> > > > +manipulations of the extended attribute structures.
> > > > +This solves the referential integrity problem by making it
> > > > possible
> > > > to commit
> > > > +a dirent update and a parent pointer update in the same
> > > > transaction.
> > > > +Chandan increased the maximum extent counts of both data and
> > > > attribute forks,
> > > 
> > > > +thereby addressing the fourth problem.
> > > which ensures the parent pointer creation will succeed even if the
> > > max
> > > extent count is reached.
> > 
> > The max extent count cannot be exceeded, but the nrext64 feature
> > ensures
> > that the xattr structure can grow enough to handle maximal
> > hardlinking.
> > 
> > "Chandan increased the maximum extent counts of both data and
> > attribute
> > forks, thereby ensuring that the extended attribute structure can
> > grow
> > to handle the maximum hardlink count of any file."
> 
> Ok, sounds good.
> 
> > 
> > > > +
> > > > +To solve the third problem, parent pointers include the dirent
> > > > name
> > > "Lastly, the new design includes the dirent name..."
> > 
> > <nod>
> > 
> > > > and
> > > > +location of the entry within the parent directory.
> > > > +In other words, child files use extended attributes to store
> > > > pointers to
> > > > +parents in the form ``(parent_inum, parent_gen, dirent_pos) →
> > > > (dirent_name)``.
> > > This parts still in flux, so probably this will have to get updated
> > > later...
> > 
> > Yep, I'll add a note about that.
> > 
> > > > +
> > > > +On a filesystem with parent pointers, the directory checking
> > > > process
> > > > can be
> > > > +strengthened to ensure that the target of each dirent also
> > > > contains
> > > > a parent
> > > > +pointer pointing back to the dirent.
> > > > +Likewise, each parent pointer can be checked by ensuring that
> > > > the
> > > > target of
> > > > +each parent pointer is a directory and that it contains a dirent
> > > > matching
> > > > +the parent pointer.
> > > > +Both online and offline repair can use this strategy.
> > 
> > I moved this paragraph up to become the second paragraph, and now it
> > reads:
> > 
> > "XFS parent pointers include the dirent name and location of the
> > entry
> > within the parent directory.  In other words, child files use
> > extended
> > attributes to store pointers to parents in the form ``(parent_inum,
> > parent_gen, dirent_pos) → (dirent_name)``.  The directory checking
> > process can be strengthened to ensure that the target of each dirent
> > also contains a parent pointer pointing back to the dirent. 
> > Likewise,
> > each parent pointer can be checked by ensuring that the target of
> > each
> > parent pointer is a directory and that it contains a dirent matching
> > the
> > parent pointer.  Both online and offline repair can use this
> > strategy.
> > 
> > Note: The ondisk format of parent pointers is not yet finalized."
> > 
> > After which comes the historical sidebar.
> Alrighty, I think that's fine for now
> 
> > 
> > > > +
> > > > +Case Study: Repairing Directories with Parent Pointers
> > > > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > +
> > > > +Directory rebuilding uses a :ref:`coordinated inode scan
> > > > <iscan>`
> > > > and
> > > > +a :ref:`directory entry live update hook <liveupdate>` as
> > > > follows:
> > > > +
> > > > +1. Set up a temporary directory for generating the new directory
> > > > structure,
> > > > +   an xfblob for storing entry names, and an xfarray for
> > > > stashing
> > > > directory
> > > > +   updates.
> > > > +
> > > > +2. Set up an inode scanner and hook into the directory entry
> > > > code to
> > > > receive
> > > > +   updates on directory operations.
> > > > +
> > > > +3. For each parent pointer found in each file scanned, decide if
> > > > the
> > > > parent
> > > > +   pointer references the directory of interest.
> > > > +   If so:
> > > > +
> > > > +   a. Stash an addname entry for this dirent in the xfarray for
> > > > later.
> > > > +
> > > > +   b. When finished scanning that file, flush the stashed
> > > > updates to
> > > > the
> > > > +      temporary directory.
> > > > +
> > > > +4. For each live directory update received via the hook, decide
> > > > if
> > > > the child
> > > > +   has already been scanned.
> > > > +   If so:
> > > > +
> > > > +   a. Stash an addname or removename entry for this dirent
> > > > update in
> > > > the
> > > > +      xfarray for later.
> > > > +      We cannot write directly to the temporary directory
> > > > because
> > > > hook
> > > > +      functions are not allowed to modify filesystem metadata.
> > > > +      Instead, we stash updates in the xfarray and rely on the
> > > > scanner thread
> > > > +      to apply the stashed updates to the temporary directory.
> > > > +
> > > > +5. When the scan is complete, atomically swap the contents of
> > > > the
> > > > temporary
> > > > +   directory and the directory being repaired.
> > > > +   The temporary directory now contains the damaged directory
> > > > structure.
> > > > +
> > > > +6. Reap the temporary directory.
> > > > +
> > > > +7. Update the dirent position field of parent pointers as
> > > > necessary.
> > > > +   This may require the queuing of a substantial number of xattr
> > > > log
> > > > intent
> > > > +   items.
> > > > +
> > > > +The proposed patchset is the
> > > > +`parent pointers directory repair
> > > > +<
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/
> > > > log/?h=pptrs-online-dir-repair>`_
> > > > +series.
> > > > +
> > > > +**Unresolved Question**: How will repair ensure that the
> > > > ``dirent_pos`` fields
> > > > +match in the reconstructed directory?
> > > > +
> > > > +*Answer*: There are a few ways to solve this problem:
> > > > +
> > > > +1. The field could be designated advisory, since the other three
> > > > values are
> > > > +   sufficient to find the entry in the parent.
> > > > +   However, this makes indexed key lookup impossible while
> > > > repairs
> > > > are ongoing.
> > > > +
> > > > +2. We could allow creating directory entries at specified
> > > > offsets,
> > > > which solves
> > > > +   the referential integrity problem but runs the risk that
> > > > dirent
> > > > creation
> > > > +   will fail due to conflicts with the free space in the
> > > > directory.
> > > > +
> > > > +   These conflicts could be resolved by appending the directory
> > > > entry and
> > > > +   amending the xattr code to support updating an xattr key and
> > > > reindexing the
> > > > +   dabtree, though this would have to be performed with the
> > > > parent
> > > > directory
> > > > +   still locked.
> > > > +
> > > > +3. Same as above, but remove the old parent pointer entry and
> > > > add a
> > > > new one
> > > > +   atomically.
> > > > +
> > > > +4. Change the ondisk xattr format to ``(parent_inum, name) →
> > > > (parent_gen)``,
> > > > +   which would provide the attr name uniqueness that we require,
> > > > without
> > > > +   forcing repair code to update the dirent position.
> > > > +   Unfortunately, this requires changes to the xattr code to
> > > > support
> > > > attr
> > > > +   names as long as 263 bytes.
> > > > +
> > > > +5. Change the ondisk xattr format to ``(parent_inum, hash(name))
> > > > →
> > > > +   (name, parent_gen)``.
> > > > +   If the hash is sufficiently resistant to collisions (e.g.
> > > > sha256)
> > > > then
> > > > +   this should provide the attr name uniqueness that we require.
> > > > +   Names shorter than 247 bytes could be stored directly.
> > > I think the RFC deluge is the same question but more context, so
> > > probably this section will follow what we decide there.  I will
> > > save
> > > commentary to keep the discussion in the same thread...
> > > 
> > > I'll just link it here for anyone else following this for now...
> > > https://www.spinics.net/lists/linux-xfs/msg69397.html
> > 
> > Yes, the deluge has much more detailed information.  I'll add this
> > link
> > (for now) to the doc.
> > 
> > > > +
> > > > +Case Study: Repairing Parent Pointers
> > > > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > +
> > > > +Online reconstruction of a file's parent pointer information
> > > > works
> > > > similarly to
> > > > +directory reconstruction:
> > > > +
> > > > +1. Set up a temporary file for generating a new extended
> > > > attribute
> > > > structure,
> > > > +   an xfblob for storing parent pointer names, and an xfarray
> > > > for
> > > > stashing
> > > > +   parent pointer updates.
> > > we did talk about blobs in patch 6 though it took me a moment to
> > > remember... if there's a way to link or tag it, that would be
> > > helpful
> > > for with the quick refresh.  kinda like wikipedia hyperlinks, you
> > > really only need like the first line or two to get it snap back
> > 
> > There is; I'll put in a backreference.
> > 
> > > > +
> > > > +2. Set up an inode scanner and hook into the directory entry
> > > > code to
> > > > receive
> > > > +   updates on directory operations.
> > > > +
> > > > +3. For each directory entry found in each directory scanned,
> > > > decide
> > > > if the
> > > > +   dirent references the file of interest.
> > > > +   If so:
> > > > +
> > > > +   a. Stash an addpptr entry for this parent pointer in the
> > > > xfblob
> > > > and xfarray
> > > > +      for later.
> > > > +
> > > > +   b. When finished scanning the directory, flush the stashed
> > > > updates to the
> > > > +      temporary directory.
> > > > +
> > > > +4. For each live directory update received via the hook, decide
> > > > if
> > > > the parent
> > > > +   has already been scanned.
> > > > +   If so:
> > > > +
> > > > +   a. Stash an addpptr or removepptr entry for this dirent
> > > > update in
> > > > the
> > > > +      xfarray for later.
> > > > +      We cannot write parent pointers directly to the temporary
> > > > file
> > > > because
> > > > +      hook functions are not allowed to modify filesystem
> > > > metadata.
> > > > +      Instead, we stash updates in the xfarray and rely on the
> > > > scanner thread
> > > > +      to apply the stashed parent pointer updates to the
> > > > temporary
> > > > file.
> > > > +
> > > > +5. Copy all non-parent pointer extended attributes to the
> > > > temporary
> > > > file.
> > > > +
> > > > +6. When the scan is complete, atomically swap the attribute fork
> > > > of
> > > > the
> > > > +   temporary file and the file being repaired.
> > > > +   The temporary file now contains the damaged extended
> > > > attribute
> > > > structure.
> > > > +
> > > > +7. Reap the temporary file.
> > > Seems like it should work
> > 
> > Let's hope so!
> > 
> > > > +
> > > > +The proposed patchset is the
> > > > +`parent pointers repair
> > > > +<
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/
> > > > log/?h=pptrs-online-parent-repair>`_
> > > > +series.
> > > > +
> > > > +Digression: Offline Checking of Parent Pointers
> > > > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > +
> > > > +Examining parent pointers in offline repair works differently
> > > > because corrupt
> > > > +files are erased long before directory tree connectivity checks
> > > > are
> > > > performed.
> > > > +Parent pointer checks are therefore a second pass to be added to
> > > > the
> > > > existing
> > > > +connectivity checks:
> > > > +
> > > > +1. After the set of surviving files has been established (i.e.
> > > > phase
> > > > 6),
> > > > +   walk the surviving directories of each AG in the filesystem.
> > > > +   This is already performed as part of the connectivity checks.
> > > > +
> > > > +2. For each directory entry found, record the name in an xfblob,
> > > > and
> > > > store
> > > > +   ``(child_ag_inum, parent_inum, parent_gen, dirent_pos)``
> > > > tuples
> > > > in a
> > > > +   per-AG in-memory slab.
> > > > +
> > > > +3. For each AG in the filesystem,
> > > > +
> > > > +   a. Sort the per-AG tuples in order of child_ag_inum,
> > > > parent_inum,
> > > > and
> > > > +      dirent_pos.
> > > > +
> > > > +   b. For each inode in the AG,
> > > > +
> > > > +      1. Scan the inode for parent pointers.
> > > > +         Record the names in a per-file xfblob, and store
> > > > ``(parent_inum,
> > > > +         parent_gen, dirent_pos)`` tuples in a per-file slab.
> > > > +
> > > > +      2. Sort the per-file tuples in order of parent_inum, and
> > > > dirent_pos.
> > > > +
> > > > +      3. Position one slab cursor at the start of the inode's
> > > > records in the
> > > > +         per-AG tuple slab.
> > > > +         This should be trivial since the per-AG tuples are in
> > > > child
> > > > inumber
> > > > +         order.
> > > > +
> > > > +      4. Position a second slab cursor at the start of the per-
> > > > file
> > > > tuple slab.
> > > > +
> > > > +      5. Iterate the two cursors in lockstep, comparing the
> > > > parent_ino and
> > > > +         dirent_pos fields of the records under each cursor.
> > > > +
> > > > +         a. Tuples in the per-AG list but not the per-file list
> > > > are
> > > > missing and
> > > > +            need to be written to the inode.
> > > > +
> > > > +         b. Tuples in the per-file list but not the per-AG list
> > > > are
> > > > dangling
> > > > +            and need to be removed from the inode.
> > > > +
> > > > +         c. For tuples in both lists, update the parent_gen and
> > > > name
> > > > components
> > > > +            of the parent pointer if necessary.
> > > > +
> > > > +4. Move on to examining link counts, as we do today.
> > > > +
> > > > +The proposed patchset is the
> > > > +`offline parent pointers repair
> > > > +<
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.g
> > > > it/log/?h=pptrs-repair>`_
> > > > +series.
> > > > +
> > > > +Rebuilding directories from parent pointers in offline repair is
> > > > very
> > > > +challenging because it currently uses a single-pass scan of the
> > > > filesystem
> > > > +during phase 3 to decide which files are corrupt enough to be
> > > > zapped.
> > > > +This scan would have to be converted into a multi-pass scan:
> > > > +
> > > > +1. The first pass of the scan zaps corrupt inodes, forks, and
> > > > attributes
> > > > +   much as it does now.
> > > > +   Corrupt directories are noted but not zapped.
> > > > +
> > > > +2. The next pass records parent pointers pointing to the
> > > > directories
> > > > noted
> > > > +   as being corrupt in the first pass.
> > > > +   This second pass may have to happen after the phase 4 scan
> > > > for
> > > > duplicate
> > > > +   blocks, if phase 4 is also capable of zapping directories.
> > > > +
> > > > +3. The third pass resets corrupt directories to an empty
> > > > shortform
> > > > directory.
> > > > +   Free space metadata has not been ensured yet, so repair
> > > > cannot
> > > > yet use the
> > > > +   directory building code in libxfs.
> > > > +
> > > > +4. At the start of phase 6, space metadata have been rebuilt.
> > > > +   Use the parent pointer information recorded during step 2 to
> > > > reconstruct
> > > > +   the dirents and add them to the now-empty directories.
> > > > +
> > > > +This code has not yet been constructed.
> > > > +
> > > > +.. _orphanage:
> > > > +
> > > > +The Orphanage
> > > > +-------------
> > > > +
> > > > +Filesystems present files as a directed, and hopefully acyclic,
> > > > graph.
> > > > +In other words, a tree.
> > > > +The root of the filesystem is a directory, and each entry in a
> > > > directory points
> > > > +downwards either to more subdirectories or to non-directory
> > > > files.
> > > > +Unfortunately, a disruption in the directory graph pointers
> > > > result
> > > > in a
> > > > +disconnected graph, which makes files impossible to access via
> > > > regular path
> > > > +resolution.
> > > > +The directory parent pointer online scrub code can detect a
> > > > dotdot
> > > > entry
> > > > +pointing to a parent directory that doesn't have a link back to
> > > > the
> > > > child
> > > > +directory, and the file link count checker can detect a file
> > > > that
> > > > isn't pointed
> > > > +to by any directory in the filesystem.
> > > > +If the file in question has a positive link count, the file in
> > > > question is an
> > > > +orphan.
> > > 
> > > Hmm, I kinda felt like this should have flowed into something like:
> > > "now that we have parent pointers, we can reparent them instead of
> > > putting them in the orphanage..."
> > 
> > That's only true if we actually *find* the relevant forward or back
> > pointers.  If a file has positive link count but there aren't any
> > links
> > to it from anywhere, we still have to dump it in the /lost+found.
> > 
> > Parent pointers make it a lot less likely that we'll have to put a
> > file
> > in the /lost+found, but it's still possible.
> > 
> > I think I'll change this paragraph to start:
> > 
> > "Without parent pointers, the directory parent pointer online scrub
> > code
> > can detect a dotdot entry pointing to a parent directory..."
> > 
> > and then add a new paragraph:
> > 
> > "With parent pointers, directories can be rebuilt by scanning parent
> > pointers and parent pointers can be rebuilt by scanning directories.
> > This should reduce the incidence of files ending up in
> > ``/lost+found``."
> I see, ok i think that sounds good then.

<nod>

--D

> Allison
> > 
> > > ?
> > > > +
> > > > +When orphans are found, they should be reconnected to the
> > > > directory
> > > > tree.
> > > > +Offline fsck solves the problem by creating a directory
> > > > ``/lost+found`` to
> > > > +serve as an orphanage, and linking orphan files into the
> > > > orphanage
> > > > by using the
> > > > +inumber as the name.
> > > > +Reparenting a file to the orphanage does not reset any of its
> > > > permissions or
> > > > +ACLs.
> > > > +
> > > > +This process is more involved in the kernel than it is in
> > > > userspace.
> > > > +The directory and file link count repair setup functions must
> > > > use
> > > > the regular
> > > > +VFS mechanisms to create the orphanage directory with all the
> > > > necessary
> > > > +security attributes and dentry cache entries, just like a
> > > > regular
> > > > directory
> > > > +tree modification.
> > > > +
> > > > +Orphaned files are adopted by the orphanage as follows:
> > > > +
> > > > +1. Call ``xrep_orphanage_try_create`` at the start of the scrub
> > > > setup function
> > > > +   to try to ensure that the lost and found directory actually
> > > > exists.
> > > > +   This also attaches the orphanage directory to the scrub
> > > > context.
> > > > +
> > > > +2. If the decision is made to reconnect a file, take the IOLOCK
> > > > of
> > > > both the
> > > > +   orphanage and the file being reattached.
> > > > +   The ``xrep_orphanage_iolock_two`` function follows the inode
> > > > locking
> > > > +   strategy discussed earlier.
> > > > +
> > > > +3. Call ``xrep_orphanage_compute_blkres`` and
> > > > ``xrep_orphanage_compute_name``
> > > > +   to compute the new name in the orphanage and the block
> > > > reservation required.
> > > > +
> > > > +4. Use ``xrep_orphanage_adoption_prep`` to reserve resources to
> > > > the
> > > > repair
> > > > +   transaction.
> > > > +
> > > > +5. Call ``xrep_orphanage_adopt`` to reparent the orphaned file
> > > > into
> > > > the lost
> > > > +   and found, and update the kernel dentry cache.
> > > > +
> > > > +The proposed patches are in the
> > > > +`orphanage adoption
> > > > +<
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/
> > > > log/?h=repair-orphanage>`_
> > > > +series.
> > > 
> > > Certainly we'll need to come back and update all the parts that
> > > would
> > > be affected by the RFC, but otherwise looks ok.  It seems trying to
> > > document code before it's written tends to cause things to go
> > > around
> > > for a while, since we really just cant know how stable a design is
> > > until it's been through at least a few prototypes.
> > 
> > Agreed!
> > 
> > --D
> > 
> > > Allison
> 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux