Re: [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 16, 2023 at 10:33 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>
> Hi all,
>
> As I've mentioned in past comments on the parent pointers patchset, the
> proposed ondisk parent pointer format presents a major difficulty for
> online directory repair.  This difficulty derives from encoding the
> directory offset of the dirent that the parent pointer is mirroring.
> Recall that parent pointers are stored in extended attributes:
>
>     (parent_ino, parent_gen, diroffset) -> (dirent_name)
>
> If the directory is rebuilt, the offsets of the new directory entries
> must match the diroffset encoded in the parent pointer, or the
> filesystem becomes inconsistent.  There are a few ways to solve this
> problem.
>
> One approach would be to augment the directory addname function to take
> a diroffset and try to create the new entry at that offset.  This will
> not work if the original directory became corrupt and the parent
> pointers were written out with impossible diroffsets (e.g. overlapping).
> Requiring matching diroffsets also prevents reorganization and
> compaction of directories.
>
> This could be remedied by recording the parent pointer diroffset updates
> necessary to retain consistency, and using the logged parent pointer
> replace function to rewrite parent pointers as necessary.  This is a
> poor choice from a performance perspective because the logged xattr
> updates must be committed in the same transaction that commits the new
> directory structure.  If there are a large number of diroffset updates,
> then the directory commit could take an even longer time.
>
> Worse yet, if the logged xattr updates fill up the transaction, repair
> will have no choice but to roll to a fresh transaction to continue
> logging.  This breaks repair's policy that repairs should commit
> atomically.  It may break the filesystem as well, since all files
> involved are pinned until the delayed pptr xattr processing completes.
> This is a completely bad engineering choice.
>
> Note that the diroffset information is not used anywhere in the
> directory lookup code.  Observe that the only information that we
> require for a parent pointer is the inverse of an pre-ftype dirent,
> since this is all we need to reconstruct a directory entry:
>
>     (parent_ino, dirent_name) -> NULL
>
> The xattr code supports xattrs with zero-length values, surprisingly.
> The parent_gen field makes it easy to export parent handle information,
> so it can be retained:
>
>     (parent_ino, parent_gen, dirent_name) -> NULL
>
> Moving the ondisk format to this format is very advantageous for repair
> code.  Unfortunately, there is one hitch: xattr names cannot exceed 255
> bytes due to ondisk format limitations.  We don't want to constrain the
> length of dirent names, so instead we could use collision resistant
> hashes to handle dirents with very long names:
>
>     (parent_ino, parent_gen, sha512(dirent_name)) -> (dirent_name)
>
> The first two patches implement this schema.  However, this encoding is
> not maximally efficient, since many directory names are shorter than the
> length of a sha512 hash.  The last three patches in the series bifurcate
> the parent pointer ondisk format depending on context:
>
> For dirent names shorter than 243 bytes:
>
>     (parent_ino, parent_gen, dirent_name) -> NULL
>
> For dirent names longer than 243 bytes:
>
>     (parent_ino, parent_gen, dirent_name[0:178],
>      sha512(child_gen, dirent_name)) -> (dirent_name[179:255])
>
> The child file's generation number is mixed into the sha512 computation
> to make it a little more difficult for unprivileged userspace to attempt
> collisions.
>

Naive question:

Obviously, the spec of stradard xattrs does not allow duplicate keys,
but dabtree does allow duplicate keys, does it not?

So if you were to allow duplicate parent pointer records, e.g.:

(parent_ino, parent_gen) -> dirent_name1
(parent_ino, parent_gen) -> dirent_name2

Or to optimize performance for the case of large number of sibling hardlinks
of the same parent (if that case is worth optimizing):

(parent_ino, parent_gen, dirent_name[0:178]) -> (dirent_name1[179:255])
(parent_ino, parent_gen, dirent_name[0:178]) -> (dirent_name2[179:255])

Then pptr code should have no problem walking all the matching
parent pointer records to find the unique parent->child record that it
needs to operate on?

I am sure it would be more complicated than how I depicted it,
but having to deal with even remote possibility of hash collisions sounds
like a massive headache - having to maintain code that is really hard to
test and rarely exercised is not a recipe for peace of mind...

Thanks,
Amir.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux