On Fri, Mar 24, 2023 at 05:10:19PM +0000, Allison Henderson wrote: > On Thu, 2023-03-16 at 12:17 -0700, Darrick J. Wong wrote: > > Hi all, > > > > As I've mentioned in past comments on the parent pointers patchset, > > the > > proposed ondisk parent pointer format presents a major difficulty for > > online directory repair. This difficulty derives from encoding the > > directory offset of the dirent that the parent pointer is mirroring. > > Recall that parent pointers are stored in extended attributes: > > > > (parent_ino, parent_gen, diroffset) -> (dirent_name) > > > > If the directory is rebuilt, the offsets of the new directory entries > > must match the diroffset encoded in the parent pointer, or the > > filesystem becomes inconsistent. There are a few ways to solve this > > problem. > > > > One approach would be to augment the directory addname function to > > take > > a diroffset and try to create the new entry at that offset. This > > will > > not work if the original directory became corrupt and the parent > > pointers were written out with impossible diroffsets (e.g. > > overlapping). > > Requiring matching diroffsets also prevents reorganization and > > compaction of directories. > > > > This could be remedied by recording the parent pointer diroffset > > updates > > necessary to retain consistency, and using the logged parent pointer > > replace function to rewrite parent pointers as necessary. This is a > > poor choice from a performance perspective because the logged xattr > > updates must be committed in the same transaction that commits the > > new > > directory structure. If there are a large number of diroffset > > updates, > > then the directory commit could take an even longer time. > > > > Worse yet, if the logged xattr updates fill up the transaction, > > repair > > will have no choice but to roll to a fresh transaction to continue > > logging. This breaks repair's policy that repairs should commit > > atomically. It may break the filesystem as well, since all files > > involved are pinned until the delayed pptr xattr processing > > completes. > > This is a completely bad engineering choice. > > > > Note that the diroffset information is not used anywhere in the > > directory lookup code. Observe that the only information that we > > require for a parent pointer is the inverse of an pre-ftype dirent, > > since this is all we need to reconstruct a directory entry: > > > > (parent_ino, dirent_name) -> NULL > > > > The xattr code supports xattrs with zero-length values, surprisingly. > > The parent_gen field makes it easy to export parent handle > > information, > > so it can be retained: > > > > (parent_ino, parent_gen, dirent_name) -> NULL > > > > Moving the ondisk format to this format is very advantageous for > > repair > > code. Unfortunately, there is one hitch: xattr names cannot exceed > > 255 > > bytes due to ondisk format limitations. We don't want to constrain > > the > > length of dirent names, so instead we create a special VLOOKUP mode > > for > > extended attributes that allows parent pointers to require matching > > on > > both the name and the value. > > > > The ondisk format of a parent pointer can then become: > > > > (parent_ino, parent_gen, dirent_name[0:242]) -> > > (dirent_name[243:255]) > > > > Because we can always look up a specific parent pointer. Most of the > > patches in this patchset prepare the high level xattr code and the > > lower > > level logging code to do this correctly, and the last patch switches > > the > > ondisk format. > > > > If you're going to start using this mess, you probably ought to just > > pull from my git trees, which are linked below. > > > > This is an extraordinary way to destroy everything. Enjoy! > > Comments and questions are, as always, welcome. > > kernel git tree: > So I gave this set a look over and for the most part I think it's ok as > long as folks agree on the new format? It's a lot different from what > folks originally articulated that they wanted, but for the most part > the format change doesnt affect parent pointer mechanics as much as it > will affect other features that may use it. I havnt seen much > complaints in response, so if it's all the same to everyone else, I am > fine with moving forward with it? Thanks all! Yes, let's move forward. As I mentioned on IRC last night I'll merge my changes into parent pointers v11 and put that out next weekish. --D > Allison > > > > https://urldefense.com/v3/__https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-name-in-attr-key__;!!ACWV5N9M2RV99hQ!NDopQXFVRjDfqgJvGMQRzS3fXyyd1SC4HfGl4dxSMWxNPsNKScecjp7bZYX-dDhL5tatk_xxJFdQwCj9YNGq$ > > > > > > xfsprogs git tree: > > https://urldefense.com/v3/__https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-name-in-attr-key__;!!ACWV5N9M2RV99hQ!NDopQXFVRjDfqgJvGMQRzS3fXyyd1SC4HfGl4dxSMWxNPsNKScecjp7bZYX-dDhL5tatk_xxJFdQwILBe7jj$ > > > > > > fstests git tree: > > https://urldefense.com/v3/__https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs-name-in-attr-key__;!!ACWV5N9M2RV99hQ!NDopQXFVRjDfqgJvGMQRzS3fXyyd1SC4HfGl4dxSMWxNPsNKScecjp7bZYX-dDhL5tatk_xxJFdQwI5iKKgk$ > > > > --- > > fs/xfs/libxfs/xfs_attr.c | 66 +++++--- > > fs/xfs/libxfs/xfs_attr.h | 5 + > > fs/xfs/libxfs/xfs_attr_leaf.c | 41 ++++- > > fs/xfs/libxfs/xfs_da_btree.h | 6 + > > fs/xfs/libxfs/xfs_da_format.h | 39 ++++- > > fs/xfs/libxfs/xfs_fs.h | 2 > > fs/xfs/libxfs/xfs_log_format.h | 31 +++- > > fs/xfs/libxfs/xfs_parent.c | 215 +++++++++++++++++---------- > > fs/xfs/libxfs/xfs_parent.h | 46 +++--- > > fs/xfs/libxfs/xfs_trans_resv.c | 7 + > > fs/xfs/scrub/dir.c | 34 +--- > > fs/xfs/scrub/dir_repair.c | 87 +++-------- > > fs/xfs/scrub/parent.c | 48 +----- > > fs/xfs/scrub/parent_repair.c | 55 ++----- > > fs/xfs/scrub/trace.h | 65 +------- > > fs/xfs/xfs_attr_item.c | 318 > > +++++++++++++++++++++++++++++++--------- > > fs/xfs/xfs_attr_item.h | 3 > > fs/xfs/xfs_inode.c | 30 ++-- > > fs/xfs/xfs_ondisk.h | 1 > > fs/xfs/xfs_parent_utils.c | 8 + > > fs/xfs/xfs_symlink.c | 3 > > fs/xfs/xfs_xattr.c | 5 + > > 22 files changed, 660 insertions(+), 455 deletions(-) > >