Re: [PATCH v3, 14/16] xfsprogs: metadump: fix duplicate handling once and for all

Alex Elder <aelder@xxxxxxx> · Fri, 25 Feb 2011 12:13:52 -0600

On Thu, 2011-02-24 at 19:39 +1100, Dave Chinner wrote:
> On Fri, Feb 18, 2011 at 03:21:02PM -0600, Alex Elder wrote:
> > This is a case where I think I've solved a problem to death.
> 
> :)

After getting through the patch, you see what I mean?

I have some long discussion below.  It is mostly
explanation for why I ended up with this, so it may
not convince you it's worth keeping (but I hope so).

> > The metadump code now stops rather than spinning forever in the face
> > of finding no obfuscated name that hasn't already been seen.
> > Instead, it simply gives up and passes the original name back to use
> > without obfuscation.
> > 
> > Unfortunately, as a result it actually creates entries with
> > duplicate names in a directory (or inode attribute fork).  And at
> > least in the case of directories, xfs_mdrestore(8) will populate the
> > directory it restores with duplicate entries.  That even seems to
> > work, but xfs_repair(8) does identify this as a problem and fixes it
> > (by moving duplicates to "lost+found").
> > 
> > This might have been OK, given that it was a rare occurence.  But
> > it's possible, with short (5-character) names, for the obfuscation
> > algorithm to come up with only a single possible alternate name,
> > and I felt that was just not acceptable.
> > 
> > This patch fixes all that by creating a way to generate alternate
> > names directly from existing names by carefully flipping pairs of
> > bits in the characters making up the name.
> > 
> > 
> > The first change is that a name is only ever obfuscated once.
> > If the obfuscated name can't be used, an alternate is computed
> > based on that name rather than re-starting the obfuscation
> > process.  (Names shorter than 5 characters are still not
> > obfuscated.)
> > 
> > Second, once a name is selected for use (obfuscated or not), it is
> > checked for duplicates.  The name table is consulted to see if it
> > has already been seen, and if it has, an alternate for that name is
> > created (a different name of the same length that has the same hash
> > value).  That name is checked in the name table, and if it too is
> > already there the process repeats until an unused one is found.
> > 
> > Third, alternates are generated methodically rather than by
> > repeatedly trying to come up with new random names.  A sequence
> > number uniquely defines a particular alternate name, given an
> > existing name.  (Note that some of those alternates aren't valid
> > because they contain at least one unallowed character.)
> > 
> > Finally, because all names are now maintained in the name table,
> > and because of the way alternates are generated, it's actually
> > possible for short names to get modified in order to avoid
> > duplicates.
> > 
> > The algorithm for doing all of this is pretty well explained in
> > the comments in the code itself, so I'll avoid duplicating any
> > more of that here.
> > 
> > Signed-off-by: Alex Elder <aelder@xxxxxxx>
> > 
> > This is a new change, not posted with this series previously.
> > 
> > ---
> >  db/metadump.c |  222 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 204 insertions(+), 18 deletions(-)
> > 
> > Index: b/db/metadump.c
> > ===================================================================
> > --- a/db/metadump.c
> > +++ b/db/metadump.c
> > @@ -27,6 +27,10 @@
> >  #include "sig.h"
> >  #include "xfs_metadump.h"
> >  
> > +#ifndef ARRAY_SIZE
> > +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> > +#endif
> > +
> 
> Doesn't that belong in a header file somewhere?

I had the same thought when I saw it being defined
in the radix tree code...  I'll put it at the to
of "include/libxfs.h", and will include that change
in this patch when I post an update.

> >  #define DEFAULT_MAX_EXT_SIZE	1000
> >  
> >  /*
> > @@ -469,41 +473,221 @@ in_lost_found(
> >  }
> >  
> >  /*
> > + * Flip a bit in each of two bytes at the end of the given name.
> > + * This is used in generating a series of alternate names to be used
> > + * in the event a duplicate is found.
> > + *
> > + * The bits flipped are selected such that they both affect the same
> > + * bit in the name's computed hash value, so flipping them both will
> > + * preserve the hash.
> > + *
> > + * The following diagram aims to show the portion of a computed
> > + * hash that a given byte of a name affects.
> > + *
> > + *         31    28      24    21            14           8 7       3     0
> > + *         +-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
> > + * hash:   | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
> > + *         +-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
> > + *        last-4 ->|           |<-- last-2 --->|           |<--- last ---->|
> > + *               |<-- last-3 --->|           |<-- last-1 --->|     |<- last-4
> > + *                       |<-- last-7 --->|           |<-- last-5 --->|
> > + *         . . . and so on
> > + *
> > + * The last byte of the name directly affects the low-order byte of
> > + * the hash.  The next-to-last affects bits 7-14, the next one back
> > + * affects bits 14-21, and so on.  The effect wraps around when it
> > + * goes beyond the top of the hash.
> > + *
> > + * The last two bytes both affect bit 7, for example, so that's a
> > + * pair of bytes (the first one tried, actually) that can be used
> > + * for this "flip bit" operation.
> > + *
> > + * A table defines pairs of bytes--and bits within them--that can be
> > + * used this way.  The byte offset is relative to a starting point
> > + * within the name, which will be set to affect the bytes at the end
> > + * of the name.
> > + *
> > + * The function is called with a "bitseq" value which indicates
> > + * which bit flip is desired, that is, selects one entry in the
> > + * bit_to_flip[] table to apply.
> > + *
> > + * The table is ordered by increasing byte offset, so that the
> > + * earliest entries can apply to the shortest strings.  But the
> > + * first bit to flip will be the one furthest down in the table
> > + * (which makes the earlier bit flips tend to occur as close to the
> > + * end of the name as possible).
> > + *
> > + * The function returns 1 if the operation was successful.  It
> > + * returns 0 if the result produced a character that's not valid in
> > + * a name (either '/' or a '\0').  Finally, it returns -1 if the bit
> > + * sequence number is beyond what is supported for a name of this
> > + * length.
> > + */
> > +static int
> > +flip_bit(
> > +	size_t		name_len,
> > +	uchar_t		*name,
> > +	uint32_t	bitseq)
> > +{
> > +	int	index;
> > +	size_t	offset;
> > +	uchar_t	*p0, *p1;
> > +	uchar_t	m0, m1;
> > +	struct {
> > +	    int		byte;	/* Offset from start within name */
> > +	    uchar_t	bit;	/* Bit within that byte */
> > +	} bit_to_flip[][2] = {	/* Sorted by second entry's byte */
> > +	    { { 0, 0 }, { 1, 7 } },	/* Each row defines a pair */
> > +	    { { 1, 0 }, { 2, 7 } },	/* of bytes and a bit within */
> > +	    { { 2, 0 }, { 3, 7 } },	/* each byte.  Each bit in */
> > +	    { { 0, 4 }, { 4, 0 } },	/* a pair affects the same */
> > +	    { { 0, 5 }, { 4, 1 } },	/* bit in the hash, so flipping */
> > +	    { { 0, 6 }, { 4, 2 } },	/* both will change the name */
> > +	    { { 0, 7 }, { 4, 3 } },	/* while preserving the hash. */
> > +	    { { 3, 0 }, { 4, 7 } },
> > +	    { { 0, 0 }, { 5, 3 } },	/* The first entry's byte offset */
> > +	    { { 0, 1 }, { 5, 4 } },	/* must be less than the second. */
> > +	    { { 0, 2 }, { 5, 5 } },
> > +	    { { 0, 3 }, { 5, 6 } },	/* The table can be extended to */
> > +	    { { 0, 4 }, { 5, 7 } },	/* an arbitrary number of entries */
> > +	    { { 4, 0 }, { 5, 7 } },	/* but there's not much point. */
> > +	    	/* . . . */
> > +	};
> 
> OK, so there are 14 different bit swaps defined here....
> 
> > +	/* Find the first entry *not* usable for name of this length */
> > +
> > +	for (index = 0; index < ARRAY_SIZE(bit_to_flip); index++)
> > +	    	if (bit_to_flip[index][1].byte >= name_len)
> > +			break;
> 
> but only 2 for name_len == 2 and 3 for name_len == 3....
> 
> > +	/*
> > +	 * Back up to the last usable entry.  If that number is
> > +	 * smaller than the bit sequence number, inform the caller
> > +	 * that nothing this large (or larger) will work.
> > +	 */
> > +	if (bitseq > --index)
> > +	    	return -1;
> 
> Which means that we only get one/two bit swaps allowed for them,
> and hence very few alternates. i.e the number of alternates are
> determined by the name length and the number of bit swaps defined
> for that name length in the above table....

Yes, what you say is all true.  The number of possibilities is
dependent on the number of overlapping bytes in the name, which
is 1 for 2 byte names, 2 for 3-byte names, 3 for 4-byte names;
then jumping to 8 for 5 byte names and adding 10 more each for
at lengths 6, 7, 8, and 9; 17 more at length 10 and a growing
number as the length grows.  The odds of needing all that gets
ridiculously small fairly quickly.

Beyond length 5 or 6 though, I don't really think it's going
to be that important.  I think it will be very strange (though
technically not impossible) to exhaust--or even make a dent
in--the number of alternate names.  So the real focus of
this change has to do with doing a better job of handling
length 5 (and possibly--but not likely--length 6) names.

The fact that I put 14 possibilities in the table had very
much to do with demonstrating what its purpose was and how
it is used, and not that much with the real concern that
it needed to be that large.  I think limiting it to the
first 8 entries would be entirely adequate, but I don't
know that the others make it any worse (or much better).

Another thing to notice is that, before this change, any
name less than 5 characters was simply never checked for
duplicates.  I realized as I developed the code that we
handle these shorter names by using the table right, so
I threw it in, more or less.  My point here is that I
don't really care much about computing alternates for
names less than 5 characters, though doing so makes
handling of all names consistent.

> > +
> > +	/*
> > +	 * We will be switching bits at the end of name, with a
> > +	 * preference for affecting the last bytes first.  Compute
> > +	 * where in the name we'll start applying the changes.
> > +	 */
> > +	offset = name_len - (bit_to_flip[index][1].byte + 1);
> > +	index -= bitseq;	/* Use later table entries first */
> > +
> > +	p0 = name + offset + bit_to_flip[index][0].byte;
> > +	p1 = name + offset + bit_to_flip[index][1].byte;
> > +	m0 = 1 << bit_to_flip[index][0].bit;
> > +	m1 = 1 << bit_to_flip[index][1].bit;
> > +
> > +	/* Only change the bytes if it produces valid characters */
> > +
> > +	if (is_invalid_char(*p0 ^ m0) || is_invalid_char(*p1 ^ m1))
> > +	    	return 0;
> > +
> > +	*p0 ^= m0;
> > +	*p1 ^= m1;
> 
> This looks sane.
> 
> > +
> > +	return 1;
> > +}
> > +
> > +/*
> > + * This function generates a well-defined sequence of "alternate"
> > + * names for a given name.  An alternate is a name having the same
> > + * length and same hash value as the original name.
> > + *
> > + * Each bit in the binary representation of the sequence number is
> > + * used to select one possible "bit flip" operation to perform on
> > + * the name.  So for example:
> > + *    seq = 0:	selects no bits to flip
> > + *    seq = 1:	selects the 0th bit to flip
> > + *    seq = 2:	selects the 1st bit to flip
> > + *    seq = 3:	selects the 0th and 1st bit to flip
> > + *    ... and so on.
> > + *
> > + * The flip_bit() function takes care of the details of the bit
> > + * flipping within the name.   The "1st bit" in this context is a
> > + * bit sequence number; i.e. it doesn't necessarily mean bit 0x02
> > + * will be changed.
> > + *
> > + * If a valid name is produced by this process for the given
> > + * sequence number, this function returns 1.  If the result is not
> > + * valid, it returns 0.  Returns -1 if the sequence number is beyond
> > + * the the maximum for names of the given length.
> > + *
> > + */
> > +static int
> > +find_alternate(
> > +	size_t		name_len,
> > +	uchar_t		*name,
> > +	uint32_t	seq)
> > +{
> > +    	uint32_t	bitseq = 0;
> > +	uint32_t	bits = seq;
> > +
> > +	if (!seq)
> > +	    	return 1;	/* alternate 0 is the original name */
> > +	if (name_len < 2)	/* Must have 2 bytes to flip */
> > +	    	return -1;
> > +
> > +	for (bitseq = 0; bits; bitseq++) {
> > +	    	uint32_t	mask = 1 << bitseq;
> > +		int	 	fb;
> > +
> > +		if (!(bits & mask))
> > +			continue;
> > +
> > +		fb = flip_bit(name_len, name, bitseq);
> > +		if (fb < 1)
> > +		    	return fb ? -1 : 0;
> > +		bits ^= mask;
> 
> Ok, so this means we get up to ARRAY_SIZE(bit_to_flip) alternate
> names if the name is long enough as the sequence number increases.
> If we want more alternates, we need to expand the bit_to_flip()
> array above.

Yes, your last statement is right.

We get exactly (2 ^ min(ARRAY_SIZE(bit_to_flip), strlen(name) - 2))
for names 2 or more bytes in length.  Some of those are not valid
because they will contain either '\0' or '/' so the number
really available may be lower--I think as low as about 10 for
certain contrived 5-character names.

> However: this is damn hard to understand and I've had to calculate
> this long hand to work out what perturbations it actually results
> in. As such, it's taken me about 4 hours to understand the mechanics
> of the patch well enough to write the above three line comment -
> there's no way I'm going to remember how this works in a months
> time, let alone when I look at it in a couple of years.

What do you expect?  It's obfuscating code!

Seriously though, I agree that it's tough to understand, and
I tried to do a lot of commenting to try to improve that.  
And I further tried to break the problem down into parts
that made it a bit easier to understand at basic conceptual
levels.  The real tricky part is all inside flip_bit() and
its use of the table.  (I considered computing bytes and bit
offsets, but felt the table was a simpler way to show what
was going on.)  Ultimately what makes it hard is seeing why
flipping specific pairs of bits is the way you come up with
alternate names--and going from there to knowing which bit
pairs those are.

So I really think some of the complexity lies in what the code
is trying to do, which is to reverse-engineer a hash and to
do so in the face of the result possibly conflicting with an
existing name.  It is in some respects like encryption code,
with all the bit flipping making it nearly impossible to
understand what the result is--it's just tough to really get
what's going on without investing a lot of mental effort.

> So, is this really any better/more reliable than just using random
> characters to generate the alternate? Random characters result in a
> far simpler and easier to verify algorithm, so if there's no
> practical advantage to this algorithm over random characters (i.e.
> how often do random character alternates actually fail?), then
> I'd say it's just too complex to bother with. If the simple
> solution is good enough, then lets just use the simple solution.

I was ready to (re)post this series a few days earlier than
I did, without this patch included.  If a duplicate was
encountered, the code simply punted and used the name
despite it being duplicate.  This meant that the resulting
metadump would contain (for example) a directory with two
entries having the same name.  I was ready to go with this,
feeling it was OK because it wasn't likely.

But for 5-character names there is (was) no random
selection involved--the algorithm basically takes
the hash and computes the characters to use in the
alternate directly from that, making simple one-bit
tweaks as it goes to avoid '\0' and '/'.  So if
a directory happened to contain just two 5-letter
names with the same hash, the algorithm would fail
and we'd potentially spit out duplicate names in that
directory in the metadump.

The odds of that happening is relatively small, but
not *that* small.  For example, "abcde", "qbcdd",
and "Abcdg" all share the same hash value--and although
it might be weird to see them together in practice,
it's not that far-fetched.

I discussed this briefly with Eric on IRC.  He didn't
take a strong position, but didn't like the thought
of a metadump producing an image that xfs_repair
would need to fix (because of duplicates) that
would otherwise be clean.  That exchange was enough
to get me thinking about how to do it better.

So in the end I just thought the way it was just
wasn't enough, so I took another day or so to develop
this more general way to enumerate possible alternates.
Once I had done it I saw it was applicable to names
of any length, so I went just a little further to
make that happen.

In summary:
- The key case that this aims to handle is 5-character
  names, for which I think the chances of hitting
  a duplicate is just greater than I'm comfortable with
  because I expect 5 is going to be a pretty typical
  name length.
- For the 5-character case, this is certainly better
  and more reliable than before, because random
  characters are not even involved in obfuscating
  these names.
- A function to flip that strange set of 8 pairs of
  bits for a 5-character name might not be much
  better or different than the bit_flip() function
  I have now.
- I would argue that this is just as verifiable as
  using random names, though it isn't easy.  It's
  evident the "easily verified" solution that was
  in place had problems that weren't anticipated.
  (I.e., no amount of random character selection
  would get past the name "R\323\257NE".)  Whether
  doing it with random characters or with a well-
  defined sequence, you need to dig in to the bits
  and the hash algorithm to know why certain names
  lead to problems, and how to avoid them.

My solution fixes the 5-character problem by solving
the general problem for names of any length.  Like I
said, I solved it to death.

> > -/*
> > - * Given a name and its hash value, massage the name in such a way
> > - * that the result is another name of equal length which shares the
> > - * same hash value.
> > - */
> >  static void
> >  obfuscate_name(
> 
> The comment should be kept.

I agree.  I'm not sure why it went away.  I think I
added it in an earlier patch and didn't update this
one properly or something.

> >  	xfs_dahash_t	hash,
> > @@ -593,6 +777,8 @@ generate_obfuscated_name(
> >  	if (*name == '/')
> >  	    	name++;		/* Skip over leading slash (if any). */
> >  
> > +	/* Obfuscate the name (if possible) */
> > +
> >  	hash = libxfs_da_hashname(name, namelen);
> >  	obfuscate_name(hash, namelen, name);
> 
> And that belongs in a previous patch, I think.

Yes I think you're right.

I truly appreciate the review Dave.  This was some
daunting stuff and I see you spent the time to really
understand it.

I'm not going to re-post this patch yet, but would
like to hear back from you.  I'd like to keep this
patch in the series because I think it's correct,
and I think it is definitely an improvement over
what was there before.  With this patch in place,
I expect we'll never run into *this* particular
class of problems again in this code again.

I've already implemented the changes you suggest
(separate from your questioning the value of the
patch in its entirety).

					-Alex

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs